Commit Graph

73 Commits

Author SHA1 Message Date
molecule-ai[bot]
705c0a46ce
Merge pull request #763 from Molecule-AI/feat/issue-733-agents-md-impl
feat(#733): implement AGENTS.md auto-generation
2026-04-17 16:21:58 +00:00
c092302712 fix(gate-6): restore claude-opus-4-7 default — reverted by pre-#743 branch
PR #763 (feat/issue-733-agents-md-impl) branched before PR #743 landed the
claude-opus-4-7 model default upgrade. config.py still had the old
claude-sonnet-4-6 default, which would have silently regressed the upgrade.

Restore both occurrences:
- WorkspaceConfig.model default: claude-sonnet-4-6 → claude-opus-4-7
- load_config() fallback: claude-sonnet-4-6 → claude-opus-4-7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:21:04 +00:00
molecule-ai[bot]
a67375d22f feat(#733): implement AGENTS.md auto-generation
Turns the QA TDD spec from PR #755 GREEN: all 14 tests pass.

Changes:
- workspace-template/agents_md.py (new): generate_agents_md(config_dir, output_path)
  Writes AAIF-compliant AGENTS.md with name, role, description, A2A endpoint,
  and MCP tools sections. AGENT_URL env var overrides the derived localhost URL.
  Falls back to description when role is absent (graceful legacy compat).
  Always overwrites — no stale-file guard.

- workspace-template/config.py: add role field to WorkspaceConfig
  New top-level field `role: str = ""` with load_config support.
  Falls back to description in agents_md.py for backward compat.

- workspace-template/main.py: wire generate_agents_md into startup (step 1a)
  Fires after load_config + preflight. Non-fatal: exception is caught and
  printed as a warning so a bad /workspace mount never kills the agent.

- workspace-template/tests/test_agents_md.py (new): pulled from PR #755 branch

Test results:
  pytest tests/test_agents_md.py -v  → 14 passed  (was: 14 RED / import error)
  pytest (full suite)                → 1044 passed, 2 xfailed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:21:04 +00:00
molecule-ai[bot]
8a00c338ee
feat(#733): implement AGENTS.md auto-generation 2026-04-17 16:20:39 +00:00
molecule-ai[bot]
c14f9f04d9
refactor(#741): extract medo.py from builtin_tools to plugins/molecule-medo
The Baidu MeDo hackathon integration was sitting in builtin_tools/ as dead
code — not imported by any loader but shipped with every workspace image,
misleadingly suggesting it was a core builtin.

Changes:
- Move builtin_tools/medo.py → plugins/molecule-medo/skills/medo-tools/scripts/medo.py
  (git detects this as a rename — no code changes, identical tool surface)
- Add plugins/molecule-medo/plugin.yaml (manifest: name, version, runtimes, tags)
- Add plugins/molecule-medo/skills/medo-tools/SKILL.md (frontmatter + setup docs)
- Move workspace-template/tests/test_medo.py → plugins/molecule-medo/tests/test_medo.py
  (update _MEDO_PATH to resolve from plugin root; add conftest.py for langchain mock)
- Update .gitignore: change /plugins/ blanket ignore to /plugins/* so this plugin
  can be tracked until it gets its own standalone repo

Acceptance criteria met:
- builtin_tools/medo.py removed from core
- plugins/molecule-medo/ created with identical tool surface (9/9 tests pass)
- cd workspace-template && pytest → 1021 passed, 2 xfailed (no regression)
- MEDO_API_KEY was never in default provisioning (.env.example / config.py clean)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:03:50 +00:00
Molecule AI Backend Engineer
ebfafb9139 feat: upgrade default workspace model to claude-opus-4-7 (#727)
Replace the anthropic:claude-sonnet-4-6 default across config, handlers,
env example, and litellm proxy config. All tests updated to match the new
default; sonnet-4-6 alias kept in litellm_config.yml for pinned workspaces.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:30:57 +00:00
molecule-ai[bot]
1f6163b5d2
Merge pull request #659 from Molecule-AI/infra/rebuild-runtime-images-script
infra: add rebuild-runtime-images.sh — patches all 6 adapter images with git credential helper (#658)
2026-04-17 10:59:33 +00:00
bbfe2e92d4 fix(security): allowlist-validate runtime arg in rebuild-runtime-images.sh
The optional $1 argument flowed directly into Docker image tag names
(workspace-template:<runtime>) and filesystem paths (RUNTIME_DIR) with
no validation, enabling path traversal or unexpected tag injection via
e.g. `bash rebuild-runtime-images.sh '../evil'`.

Fix: introduce VALID_RUNTIMES allowlist and validate $1 against it
before setting RUNTIMES. Any unlisted value now exits with a clear
error message. The RUNTIMES array is populated from VALID_RUNTIMES
when no argument is given, keeping the all-runtimes default path.

shellcheck clean; $1 only appears inside the validated block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 10:27:11 +00:00
7066fce6f4 fix(infra): rename TMPDIR→RUNTIME_DIR, fix PIPESTATUS docker exit check
Bug 1: TMPDIR is a POSIX-reserved variable used by mktemp, Docker
BuildKit, and git subprocesses as their system temp directory.
Overwriting it redirected those tools to the build context, causing
unpredictable failures. Renamed all 6 occurrences to RUNTIME_DIR.

Bug 2: `docker build ... | grep` made grep's exit code (0=match,
1=no match) determine if the build succeeded, not docker's. Fixed by
reading PIPESTATUS[0] immediately after the pipeline so docker's real
exit code drives the SUCCESS/FAILED tracking.

Also fixed two pre-existing shellcheck warnings:
- SC2034: removed unused REPO_ROOT variable
- SC2064: trap now uses single quotes so TMPBASE expands at signal time

shellcheck clean with no warnings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 10:25:43 +00:00
Molecule AI QA Engineer
5c95c6dc42 test: add _load_config_dict coverage for issue #652
Cover the four paths that were exercised only via mock in the
_build_options tests: valid YAML, missing file, malformed YAML,
and empty file (safe_load → None → {} via `or {}`).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 10:08:45 +00:00
Molecule AI Backend Engineer
cf5428664b feat(issue-652): wire effort and task_budget to claude sdk output_config
Adds _load_config_dict() helper to ClaudeSDKExecutor and wires the new
effort and task_budget config fields into _build_options() before the
Anthropic API call:

- effort (str): low|medium|high|xhigh|max — populates output_config.effort
- task_budget (int): advisory total-token budget; must be >= 20000 when set;
  automatically adds task-budgets-2026-03-13 beta header

Also adds WorkspaceConfig.effort and WorkspaceConfig.task_budget fields in
config.py and 5 acceptance tests covering all code paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:33:07 +00:00
b7c0d3d22a infra: add rebuild-runtime-images.sh for post-PR#640 image fix (#658)
Standalone adapter images (langgraph, claude-code, etc.) use
ENTRYPOINT ["molecule-runtime"] which bypasses entrypoint.sh. PR #640's
entrypoint.sh fix therefore never runs in adapter images. The correct fix
is to bake git config --system into the image at build time.

This script:
1. Rebuilds workspace-template:base from the monorepo Dockerfile (which
   has the fixed entrypoint.sh and molecule-git-token-helper.sh)
2. For each of the 6 runtime adapters: clones the standalone repo, patches
   its Dockerfile to COPY the credential helper and run git config --system,
   then builds the final image tagged as workspace-template:<runtime>

Usage (run on the host machine, not inside a workspace container):
  bash workspace-template/rebuild-runtime-images.sh          # all 6
  bash workspace-template/rebuild-runtime-images.sh claude-code  # one

See issue #658 for the architectural explanation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:14:12 +00:00
af00a6c128 fix(merge): combine response_format (#498) and tools (#497) in hermes_executor
Both PRs restructured the same chat.completions.create() call to use a
create_kwargs dict. Resolved by keeping both __init__ params and both
conditionals in the create_kwargs block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:03:22 +00:00
molecule-ai[bot]
c9b8c26d5f
feat(hermes): native tools=[] parameter instead of text-in-prompt workaround (#497)
feat(hermes): native tools=[] parameter instead of text-in-prompt workaround (#497)
2026-04-17 06:56:10 +00:00
c509eca31d fix(template): copy molecule-git-token-helper.sh into image and fix path
Two bugs prevented the git credential helper (merged in #567) from ever
running at workspace boot:

1. Dockerfile never COPY'd scripts/molecule-git-token-helper.sh into the
   image — only gh-wrapper.sh was copied from scripts/. Result: the helper
   binary did not exist in any built container image.

2. entrypoint.sh looked for the helper at /workspace-template/scripts/...
   but /workspace-template/ is not a path that exists inside the container
   (WORKDIR is /app, no /workspace-template mount). The `if [ -f ... ]`
   guard silently fell through to the WARNING branch on every boot since
   #567 merged — the helper was never registered.

Fix:
- Add `COPY scripts/molecule-git-token-helper.sh ./scripts/` to Dockerfile
  so the script lands at /app/scripts/ in the image (matching WORKDIR /app)
- Update HELPER_SCRIPT path in entrypoint.sh from
  /workspace-template/scripts/... to /app/scripts/...

After this fix, every workspace container registers the helper at boot via:
  git config --global credential.https://github.com.helper \
    "!/app/scripts/molecule-git-token-helper.sh"

Closes #613.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:27:08 +00:00
4eb56ebec6 fix(plugins_registry): deduplicate handlers in _deep_merge_hooks()
Unconditional list.extend() on repeated plugin install caused every
hook handler to be appended on each reinstall, leading to 3-4x duplicate
firings per event (PreToolUse, PostToolUse, Stop, etc.).

Fix: before appending each incoming handler, compute a fingerprint of
(matcher, frozenset-of-commands). Skip append if the fingerprint is
already present in the merged list. First-time installs are unaffected —
new handlers still land correctly.

Adds 7 unit tests covering: first install, double install, triple install,
different-matcher co-existence, different-command co-existence, existing
user hook preservation, and top-level key merge semantics.

Closes #566

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:22:00 +00:00
Molecule AI Backend Engineer
1d41f23ddd feat(hermes): plumb response_format=json_schema for structured output (#498)
Adds response_format support to HermesA2AExecutor so callers can request
structured JSON output via the OpenAI-native response_format parameter.

Changes:
- _validate_response_format(): validates type (json_schema/json_object/text)
  and required sub-fields; returns None if valid, error message if invalid
- HermesA2AExecutor.__init__: new response_format kwarg, stored as _response_format
- execute(): validates before API call — invalid schema enqueues error and
  returns early without hitting Hermes API; valid and non-None adds
  response_format= to create_kwargs; None omits the field entirely

Tests (12 new):
  - _validate_response_format: all valid types, invalid type, missing fields
  - constructor stores response_format correctly
  - valid response_format forwarded to API call
  - response_format omitted when None (no key in call kwargs)
  - invalid schema → error message enqueued, API not called

Closes #498

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 01:19:51 +00:00
Molecule AI Backend Engineer
6d253b961d feat(hermes): pass tools via native tools[] parameter instead of text-in-prompt (#497)
Instead of injecting tool definitions as text into the system prompt,
HermesA2AExecutor now accepts a tools: list[dict] | None constructor
parameter containing OpenAI-format tool definitions and forwards them
via the native tools= parameter on chat.completions.create().

Empty list / None rule: when tools is falsy, the tools key is omitted
from the API call entirely — never sent as tools=[] — so providers
that reject an empty tools array don't return a 400.

Tool-call response handling: when the model returns finish_reason
"tool_calls" with no text content, the executor serialises the call
list as a JSON string and enqueues it as the A2A reply. This keeps
the executor thin (single API call per turn, no ReAct loop) while
surfacing function-call intent in a structured, parseable format.

Changes:
- HermesA2AExecutor.__init__: new tools kwarg; stored as self._tools
  (copy; mutating the input list has no effect)
- execute(): builds create_kwargs dict and conditionally adds tools=
  only when self._tools is non-empty; handles tool_calls response
- Module docstring: new "Native tools (#497)" section with schema
  reference and edge-case explanation

Tests (12 new, 47 total in hermes test file, 1002 total suite):
  - tools stored correctly in constructor (copy, None, [], non-empty)
  - non-empty tools forwarded as tools= in API call
  - multiple tools all forwarded
  - empty list ([] and None and default) → tools key absent from call
  - model tool_call response → JSON-serialised list as A2A reply
  - multiple tool_calls → all in JSON reply
  - text content present → text wins over tool_calls

Closes #497

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 01:00:23 +00:00
molecule-ai[bot]
b1c976a54d
fix(github): refresh installation token when TTL < 10 min (#547) (#567)
Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN
into each workspace container's env at provision time (EnvMutator). Those
are GitHub App installation tokens with a fixed ~60 min TTL. The plugin
has an in-process cache that proactively refreshes 5 min before expiry —
but the workspace env is set once at container start and never updated.
Any workspace alive >60 min ends up with an expired token.

Fix (Option B — on-demand endpoint):

pkg/provisionhook:
  - Add TokenProvider interface: Token(ctx) (token, expiresAt, error)
    Lives in pkg/ (public) so the github-app-auth plugin can implement it.
  - Add Registry.FirstTokenProvider() — discovers the first mutator that
    also satisfies TokenProvider via interface assertion. Safe under
    concurrent reads (existing RWMutex).

platform/internal/handlers/github_token.go:
  - New GitHubTokenHandler serving GET /admin/github-installation-token
  - Delegates to the registered TokenProvider (plugin cache — always fresh)
  - 404 if no GitHub App configured, 500 + [github] prefix log on error
  - Never logs the token itself

platform/internal/handlers/workspace.go:
  - Add TokenRegistry() getter so the router can wire the handler without
    coupling to WorkspaceHandler internals

platform/internal/router/router.go:
  - Register GET /admin/github-installation-token under AdminAuth

workspace-template/:
  - scripts/molecule-git-token-helper.sh — git credential helper; calls
    the platform endpoint on every push/fetch; falls through to next
    helper (operator PAT) if platform unreachable
  - entrypoint.sh — configure the credential helper at startup

Why Option B over Option A (background goroutine):
  - The plugin already has its own cache refresh; nothing to refresh here.
  - Pushing env updates into running containers requires docker exec, which
    the architecture explicitly rejects (issue #547 "Alternatives").
  - Pull-based is stateless, trivially testable, zero extra goroutines.

Closes #547

Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:47:03 +00:00
Molecule AI Backend Engineer
dbcea7f191 feat(adapters): add Google ADK runtime adapter (#542)
Implements WorkspaceAdapter for Google's Agent Development Kit (google-adk
v1.x, Apache-2.0). Ships four files under workspace-template/adapters/google-adk/:

- adapter.py — GoogleADKAdapter + GoogleADKA2AExecutor (100% test coverage)
- requirements.txt — pinned google-adk==1.30.0 + google-genai>=1.16.0
- README.md — overview, install, usage, config, architecture diagram
- test_adapter.py — 46 unit tests, all passing, no live API calls

Supports AI Studio (GOOGLE_API_KEY) and Vertex AI (GOOGLE_GENAI_USE_VERTEXAI=1).
Model prefix stripping: "google:gemini-2.0-flash" → "gemini-2.0-flash".
Error sanitization mirrors the hermes_executor convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:08:17 +00:00
Molecule AI Backend Engineer
3d817a42b7 feat(hermes): expose reasoning mode for Hermes 4 via OpenAI-compat API (#496)
Hermes 4 is a hybrid-reasoning model trained on <think> tags; without asking
for thinking we pay flagship $/tok but get non-reasoning quality. This adds a
dedicated HermesA2AExecutor that dispatches to any OpenAI-compat endpoint
(OpenRouter, Nous Portal) and enables native reasoning for Hermes 4 models.

Key decisions:
- ProviderConfig + _reasoning_supported() detect Hermes 4 by model slug
  substring ("hermes-4", "hermes4") — case-insensitive, no config needed
- extra_body={"reasoning": {"enabled": True}} sent only to Hermes 4 entries;
  Hermes 3 path unchanged (no extra_body, no regressions)
- choices[0].message.reasoning + reasoning_details extracted and written to
  an OTEL span (hermes.reasoning) — deliberately NOT echoed in the A2A reply
  so the reasoning trace never contaminates the agent's next-turn context
- API key / base URL default to OPENAI_API_KEY / OPENAI_BASE_URL env vars
  with openrouter.ai/api/v1 as the fallback endpoint
- _client injection parameter for unit tests (no live API calls needed)
- Error sanitization: only exception class name surfaces to user (mirrors
  sanitize_agent_error() convention from cli_executor.py)

Test coverage: 35 tests, 100% coverage on all new code paths including:
  - _reasoning_supported() — Hermes 4/3/unknown/empty/uppercase
  - ProviderConfig — field assignment and capability flags
  - extra_body presence for Hermes 4, absence for Hermes 3
  - reasoning not in A2A reply; _log_reasoning called when trace present
  - reasoning_details forwarded; span attributes set correctly
  - Telemetry failure swallowed (never blocks response)
  - API error → sanitized class-name-only reply
  - cancel() → TaskStatusUpdateEvent(state=canceled)

Full suite: 990 passed, 0 failed (no regressions).

Resolves #496

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 20:38:45 +00:00
Hongming Wang
74e4f30216 fix: address all code review findings + remove exposed secrets
Code review fixes:
- 🟡 #1: Replace python3 with jq in Dockerfile template stages (~50MB → ~2MB)
- 🟡 #2: Add clone count verification to scripts/clone-manifest.sh
  (set -e + expected vs actual count check — fails build if any clone fails)
- 🟡 #3: Drop 'unsafe-eval' from CSP (not needed for Next.js production
  standalone builds, only dev mode). Updated test assertion.
- 🟡 #4: Remove broken pyproject.toml from workspace-template/ (it claimed
  to package as molecule-ai-workspace-runtime but the directory structure
  didn't match — the real package ships from the standalone repo)
- 🔵 #1: Add version-pinning TODO comment to manifest.json
- 🔵 #3: Add full repo URLs + test counts for SDK/MCP/CLI/runtime in CLAUDE.md

Security (GitGuardian alert):
- Removed Telegram bot token (8633739353:AA...) from template-molecule-dev
  pm/.env — replaced with ${TELEGRAM_BOT_TOKEN} placeholder
- Removed Claude OAuth token (sk-ant-oat01-...) from template-molecule-dev
  root .env — replaced with ${CLAUDE_CODE_OAUTH_TOKEN} placeholder
- Both tokens need immediate rotation by the operator

Tests: Platform middleware tests updated + all pass.
2026-04-16 05:05:49 -07:00
Hongming Wang
55a2ee0153 fix: properly remove adapter subdirectories + move shared code to root
PR #471 removed Dockerfiles/requirements from adapters/ but left the
Python source files. This commit finishes the extraction:

1. Moved shared_runtime.py → workspace-template/shared_runtime.py
   (used by prompt.py, a2a_executor.py, coordinator.py — not adapter-specific)
2. Moved base.py → workspace-template/adapter_base.py
   (BaseAdapter + AdapterConfig — the interface adapters implement)
3. Updated imports in prompt.py, a2a_executor.py, coordinator.py
4. Rewritten adapters/__init__.py as a thin shim that:
   - Reads ADAPTER_MODULE env var (production: standalone repos set this)
   - Re-exports BaseAdapter/AdapterConfig for backward compat
5. adapters/base.py + adapters/shared_runtime.py remain as re-export shims
6. Deleted all 8 adapter subdirectories (autogen, claude_code, crewai,
   deepagents, gemini_cli, hermes, langgraph, openclaw)
7. Removed 11 test files that imported adapter-specific code

Tests: 955 passed, 0 failed (down from 1216 — the difference is
adapter-specific tests that moved to standalone repos).
2026-04-16 04:59:13 -07:00
Hongming Wang
8ea8c1d7af fix: remove tests that referenced removed plugins/ directory
test_first_party_plugins.py, test_plugins_builtins_drift.py, and
test_hermes_adapter.py all referenced files under plugins/ and
adapters/ which were extracted to standalone repos. These tests
belong in those repos now, not in the core workspace-template.

1216 passed, 0 failed after removal.
2026-04-16 04:39:31 -07:00
Hongming Wang
57ad7b5fe5 chore: remove adapter Dockerfiles and requirements.txt from monorepo
These files have moved to the standalone template repos:
  https://github.com/Molecule-AI/molecule-ai-workspace-template-<runtime>

Each adapter repo now has its own Dockerfile (FROM python:3.11-slim + pip install
molecule-ai-workspace-runtime) and requirements.txt. The adapter Python source
files (.py) stay in the monorepo for local development and testing.

Adapters removed from workspace-template/adapters/*/: Dockerfile, requirements.txt
Adapters retained: adapter.py, __init__.py (+ hermes extras: escalation.py, executor.py, providers.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 04:33:22 -07:00
Hongming Wang
cb74f0d6ae chore: extract workspace runtime to PyPI + move adapter Dockerfiles to template repos
Published `molecule-ai-workspace-runtime==0.1.0` to PyPI:
  https://pypi.org/project/molecule-ai-workspace-runtime/0.1.0/

Source repo: https://github.com/Molecule-AI/molecule-ai-workspace-runtime

Each adapter's Dockerfile and requirements.txt have moved to the corresponding
standalone template repo (molecule-ai-workspace-template-<runtime>). The adapter
Python code (.py files) stays in the monorepo for local dev and testing.

Changes:
- workspace-template/pyproject.toml — new, packages the shared runtime as a PyPI package
- workspace-template/adapters/*/Dockerfile — removed (now in template repos)
- workspace-template/adapters/*/requirements.txt — removed (now in template repos)
- workspace-template/Dockerfile — drop COPY adapters/ (still copies .py files via *.py glob)
- workspace-template/build-all.sh — simplified to base-image-only build
- workspace-template/entrypoint.sh — remove adapter requirements.txt install step
- workspace-template/tests/test_hermes_adapter.py — skip Dockerfile/requirements.txt checks
- CLAUDE.md — update architecture description + workspace image table
- docs/workspace-runtime-package.md — new, explains the package + adapter repo layout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 04:33:10 -07:00
rabbitblood
067a8333ce feat(workspace): gh-wrapper — auto-tag agent PRs + issues with role
Every agent in the template currently uses the same GitHub PAT, so
\`gh pr list\` shows every PR as authored by the CEO's account with
no signal which agent opened each one. Commits already carry
per-agent authors (GIT_AUTHOR_NAME from #402). This wrapper extends
the identity split to the PR/issue metadata surface layer that
commit attribution can't reach.

## How it works

A tiny bash script installed at \`/usr/local/bin/gh\`, which sits
earlier in PATH than the real binary at \`/usr/bin/gh\`. For \`gh pr
create\` and \`gh issue create\`:

- Title gets prefixed with \`[Role Name]\` — e.g. \`[Frontend Engineer]
  fix: canvas grid index\`
- Body gets \`\n\n---\n_Opened by: Molecule AI <Role>_\` appended

Role is read from \`GIT_AUTHOR_NAME\` which the platform provisioner
sets to \`Molecule AI <Role>\` (shipped with #402). Accepts both
\`--title X\` and \`--title=X\` forms. Same for \`--body\`.

Anything that isn't \`gh pr create\` or \`gh issue create\` (e.g.
\`gh pr list\`, \`gh issue view\`, \`gh run watch\`) passes through
untouched. No behaviour change for read-side operations.

## Idempotent

- If the title already starts with \`[...]\` the wrapper does not
  re-prefix. \`gh pr edit\` flows that resubmit title won't layer
  multiple tags.
- If the body already contains \`Opened by: Molecule AI\` the footer
  is not re-appended.

## Fail-open

When \`GIT_AUTHOR_NAME\` is absent or doesn't start with \`Molecule
AI \`, the wrapper exec's the real gh with unchanged args. No call
is ever blocked by this script.

## Test coverage

\`tests/test_gh_wrapper.sh\` — 12 cases, no network, no Docker:
- Passthrough for non-create subcommands (pr list)
- pr create title prefix + body footer
- issue create with \`--title=X\` \`--body=X\` equals-form
- Idempotent title re-prefix
- Idempotent body footer (count = 1 after two applies)
- Missing GIT_AUTHOR_NAME → passthrough, title preserved
- Malformed GIT_AUTHOR_NAME (not "Molecule AI ...") → passthrough

All 12 pass. Test script is standalone bash + a temp fake gh binary
that echoes argv; safe to run in CI's Python Lint & Test job via
subprocess shell-out.

## Deployment note

This lands in the workspace image. Existing containers keep their
old /usr/bin/gh until the image is rebuilt and they're re-provisioned
(POST /workspaces/:id/restart {}). No migration required; the wrapper
just starts tagging PRs once the new image is rolled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 03:10:46 -07:00
rabbitblood
b30d8d431c fix(tests): test_hermes_phase2_dispatch exec-load needs escalation + __name__
Phase 3 escalation ladder added `from .escalation import ...` to
executor.py. The phase-2 dispatch tests load executor.py via
`exec(compile(src, ...))` with the relative import rewritten — this
broke because (a) the rewrite didn't know about escalation and (b) the
exec namespace lacked `__name__`, which executor.py needs at import
time for `logging.getLogger(__name__)`.

Fix both in all 8 exec sites:
- Rewrite both `from .providers import` AND `from .escalation import`
- Pre-register escalation + providers in sys.modules under the fake
  package name
- Seed the exec namespace with `__name__ = "hermes_executor_under_test"`

54/54 hermes tests pass (28 escalation truth-table + 6 ladder-integration
+ 20 existing phase-2 dispatch).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 02:43:02 -07:00
rabbitblood
3cd18929c4 feat(hermes): escalation ladder — promote to stronger models on transient failure
Ships scoped Phase 3 of the Hermes multi-provider work. Every workspace
can now declare an ordered list of (provider, model) rungs; when the
pinned model hits rate-limit / 5xx / context-length / overload, the
executor advances to the next rung before raising.

## Why

3× Claude Max saturation is a routine occurrence now — the "first 429 on
a batch delegation" is the common path, not the exception. A workspace
pinned to Haiku that hits a context-length limit has no recovery today;
same for Sonnet hitting rate-limit mid-synthesis. Escalation promotes
to the next tier for that single call, preserves coordination, avoids
restart cascades.

## New module: adapters/hermes/escalation.py

- ``LadderRung(provider, model)`` — one config entry.
- ``parse_ladder(raw)`` — tolerant config parser; skips malformed rungs
  with a warning rather than raising so boot stays resilient.
- ``should_escalate(exc) -> bool`` — truth table over 15+ error shapes:
  - Typed classes (RateLimitError, OverloadedError, APITimeoutError,
    APIConnectionError, InternalServerError)
  - Context-length markers (each provider uses different phrasing)
  - Gateway markers (502/503/504, overloaded, temporarily unavailable)
  - Status-code substrings (429, 529, 5xx)
  - Hard-rejects auth failures (401/403/invalid_api_key) even if the
    outer exception class is RateLimitError — wrapping case matters.

## Executor wiring

``HermesA2AExecutor`` now accepts ``escalation_ladder`` in its
constructor + ``create_executor()`` factory. ``_do_inference()`` walks
the ladder:

  1. First attempt = pinned provider:model (matches pre-ladder behaviour)
  2. On escalatable error, try each rung in order
  3. On non-escalatable error, raise immediately (auth, malformed payload)
  4. On exhaustion, raise the last error

Rung switches temporarily rebind ``self.provider_cfg`` / ``self.model``
/ ``self.api_key`` / ``self.base_url`` in a try/finally, so any raised
error leaves the executor in its original state for the next call. Key
resolution for non-pinned rungs goes through ``resolve_provider`` which
reads the rung-provider's env vars fresh.

## Config shape

``config.yaml`` (rendered from ``org.yaml`` → workspace secrets):

    runtime_config:
      escalation_ladder:
        - provider: gemini
          model: gemini-2.5-flash
        - provider: anthropic
          model: claude-sonnet-4-5-20250929
        - provider: anthropic
          model: claude-opus-4-1-20250805

Empty / absent = single-shot behaviour, full backwards-compat with
every existing workspace.

## Tests

34 passing, all isolated (no network):

- ``test_hermes_escalation.py`` (28): parser + truth-table across
  rate-limit, overload, context-length, gateway, auth-reject, unrelated
  exceptions, and case-insensitivity.
- ``test_hermes_ladder_integration.py`` (6): no-ladder single call,
  ladder-not-triggered on success, escalate-on-rate-limit-then-succeed,
  stop-on-non-escalatable, raise-last-error-when-exhausted, skip-
  unknown-provider-in-rung.

## Not in this PR

- Uncertainty-driven escalation (judge pass after successful reply).
- Per-workspace budget tracking (#305 covers this separately).
- Live streaming reuse across rungs (ladder retries the whole call).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 02:27:27 -07:00
Hongming Wang
37b288c79b
fix(a2a): add missing Authorization header to delegation and message calls (#401)
* fix(a2a): add missing Authorization header to delegation and message calls

Three A2A client functions were missing the Bearer token on their HTTP calls
after the Phase 30.1 workspace-auth enforcement rollout:

1. send_a2a_message (a2a_client.py): POST to target workspace's /message/send
   used WorkspaceAuth middleware that fails-closed on missing auth header.
   Fix: headers=auth_headers() — auth_headers() already imported.

2. tool_delegate_task_async (a2a_tools.py): POST to platform /delegate endpoint
   requires the caller's workspace bearer token since Phase 30.1.
   Fix: headers=_auth_headers_for_heartbeat()

3. tool_check_task_status (a2a_tools.py): GET /delegations endpoint, same issue.
   Fix: headers=_auth_headers_for_heartbeat()

tool_list_peers already uses _auth_headers_for_heartbeat() correctly —
that's why list_peers works while delegation returns 401/[A2A_ERROR].

Root cause of the multi-session A2A outage. PR #386 (TTL fix) addressed
the workspace-restart cascade; this fixes the underlying 401 on each call.

Closes #391
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(a2a): add missing auth headers to /activity and /notify endpoints

Two more Phase 30.1 regressions in a2a_tools.py found during send_message_to_user
debugging (it was returning 401):

- tool_report_activity: POST /workspaces/:id/activity missing headers
- tool_send_message_to_user: POST /workspaces/:id/notify missing headers

Both now use headers=_auth_headers_for_heartbeat() matching the pattern used
by commit_memory, recall_memory, and the heartbeat POST in the same file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: PM (Molecule AI) <pm@molecule-ai.internal>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 00:53:18 -07:00
Hongming Wang
0aec76400a
feat(adapters): add gemini-cli runtime adapter (closes #332) (#379)
Adds a `gemini-cli` workspace runtime backed by Google's Gemini CLI
(@google/gemini-cli, ~101k ★, Apache 2.0). Mirrors the claude-code
adapter pattern: Docker image installs the CLI, CLIAgentExecutor
drives the subprocess, A2A MCP tools wire via ~/.gemini/settings.json.

Changes:
- workspace-template/adapters/gemini_cli/ — new adapter (Dockerfile,
  adapter.py, __init__.py, requirements.txt); setup() seeds GEMINI.md
  from system-prompt.md and injects A2A MCP server into settings.json
- workspace-template/cli_executor.py — adds gemini-cli to
  RUNTIME_PRESETS (--yolo flag, -p prompt, --model, GEMINI_API_KEY env
  auth); adds mcp_via_settings preset flag to skip --mcp-config
  injection for runtimes that own their own settings file
- workspace-configs-templates/gemini-cli/ — default config.yaml +
  system-prompt.md template
- tests/test_adapters.py — adds gemini-cli to expected adapter set
- CLAUDE.md — documents new runtime row in the image table

Requires: GEMINI_API_KEY global secret. Build:
  bash workspace-template/build-all.sh gemini-cli

Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 23:30:00 -07:00
Hongming Wang
e7bde9a919
Merge pull request #338 from Molecule-AI/fix/issue-328-transcript-fail-closed
fix(security): /transcript fails closed when auth token missing (#328)
2026-04-15 21:30:56 -07:00
Hongming Wang
c11d8f3ec3
fix(security): hitl task-id ownership + wire fail_open_if_no_scanner in loader (closes #265, #268)
Security audit cycle 13: hitl.py LGTM (workspace-scoped task IDs). Loader.py fix applied (commit 0557f73): fail_open_if_no_scanner now read from config and forwarded to scan_skill_dependencies(); regression test added. CI 5/6 pass (E2E cancel = run-supersession pattern). Closes #265. Closes #268.
2026-04-15 21:18:52 -07:00
Hongming Wang
5eb08332ee fix(security): /transcript endpoint fails closed when auth token missing (#328)
Severity HIGH. The /transcript route in main.py used `if expected:`
around the bearer-token compare, so `get_token()` returning None (no
/configs/.auth_token on disk — bootstrap window, deleted file, OSError)
silently skipped the entire auth check. Any container on
molecule-monorepo-net could GET /transcript during the provisioning
window and walk away with the full session log (user messages, Claude
tool calls, assistant replies).

The platform's TranscriptHandler always has a valid token (it acquired
one at workspace registration), so tightening this gate has no
legitimate-caller impact. Only unauthenticated sniffers lose access,
which was never the intended contract of #287.

Fix:

1. Extracted the auth gate into `workspace-template/transcript_auth.py`
   — a 20-line module with no heavy imports so the security-critical
   code is unit-testable without standing up the full uvicorn/a2a/httpx
   stack (the former inline guard could only be tested end-to-end,
   which explains why the regression shipped in #287).

2. `transcript_authorized(expected, auth_header)` returns False when
   `expected` is None or empty — the #328 fix — and otherwise does
   strict equality against "Bearer <expected>".

3. main.py's inline handler calls the extracted function:
     if not _transcript_authorized(get_token(), auth_header):
         return 401

4. New tests/test_transcript_auth.py covers: None token, empty token,
   valid bearer, wrong bearer, missing header, case-sensitive prefix,
   whitespace fuzzing. All 7 pass.

Closes #328

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:17:37 -07:00
Hongming Wang
fec287fce3
fix(security): add bearer token auth to /transcript endpoint (#287)
Closes #287

Any container on molecule-monorepo-net could previously read the full Claude session log without authentication. Guard uses get_token() from platform_auth — skipped only before workspace registration (dev-mode).
2026-04-15 19:47:23 -07:00
Hongming Wang
e88ae9f6d0
fix(a2a-tools): auth_headers on recall_memory + commit_memory (#304)
Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.
2026-04-15 19:12:18 -07:00
Hongming Wang
472495c380
Merge pull request #270 from Molecule-AI/feat/workspace-transcript-endpoint
feat: GET /workspaces/:id/transcript — live agent session log
2026-04-15 17:55:41 -07:00
Hongming Wang
469d24c23a fix(tests): update memory fakes for auth_headers kwarg + activity overwrite
The #215-class fix in memory.py (859a60e) adds headers=_headers to the
direct-httpx commit_memory + search_memory paths, but 9 existing tests
in test_memory.py had FakeAsyncClient.post/get signatures like
`async def post(self, url, json):` with no headers kwarg. Python
raised TypeError: unexpected keyword argument 'headers' on every call,
commit_memory caught it and returned {success: False}, tests failed.

Fixes applied:

1. Add `headers=None` to every FakeAsyncClient.post + .get signature
   across test_memory.py. Uses replace_all so all 9+ fakes match.

2. For tests that capture a single captured["url"]:
   - test_commit_memory_uses_awareness_client_when_configured
   - test_commit_memory_uses_platform_fallback_without_awareness
   - test_commit_memory_httpx_201_success
   filter to only capture /memories URLs. Without the filter, the
   subsequent _record_memory_activity fire-and-forget post to /activity
   overwrites captured["url"] and the assertion fails.

3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump
   expected captured["calls"] from 3 to 4. Pre-fix, the memory_write
   /activity call (from _record_memory_activity #125) was silently
   dropped because the fake rejected headers=; post-fix it succeeds
   and lands in the captured list alongside the skill_promotion
   /activity and /registry/heartbeat calls. Also extend that test's
   fake to accept /registry/heartbeat (was raising AssertionError).

Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189.

This is strictly test-infrastructure work — zero production code
changed. CI never caught the break because the Mac mini runner has
been stuck for ~4 hours (tick-33/34/35/36 reports).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:29:15 -07:00
rabbitblood
48eca7264c fix(memory-tools): #215-class — auth_headers on commit_memory + search_memory HTTP fallback
Context: platform now gates `GET /workspaces/:id/memories` and
`POST /workspaces/:id/memories` behind workspace auth (post-#166 /
#167 AdminAuth wave). The `builtin_tools.memory` tool had three HTTP
call sites:

  1. commit_memory POST fallback (line 121)        ← NO auth_headers
  2. search_memory GET fallback (line 269)         ← NO auth_headers
  3. activity-log helper POST (line 371)           ← HAS auth_headers

Path 3 was already fixed. Paths 1 + 2 silently 401 every call, but the
tool's error-handling path returns `{"success": False}` without surfacing
the auth failure to the agent. Result: the agent sees an empty memory
backlog on every call and assumes there's nothing to do.

## Discovered today

Technical Researcher is the first workspace opted in to the idle-loop
pilot from #216 (reflection-on-completion pattern). The pilot fires
every 10 min, the agent calls `search_memory "research-backlog:..."` as
the first step, gets back an empty result, writes "tr-idle clean" to
memory, and stops. Clean-idle outcome every tick, 9 consecutive ticks.

Looking at TR's activity_logs response bodies:

    "Memory auth has failed on every tick this session — skipping the call"
    "tr-idle — step 2 done. Memory unavailable (auth token missing..."
    "tr-idle 04:15 — clean (memory auth still down, 3rd consecutive tick)"

The AGENT knew the memory calls were failing. The platform 401 error
was surfacing in the tool response, but our instrumentation wasn't
counting it as a defect — we saw "tr-idle clean" writes and assumed
the pilot was working as designed. It was actually silently broken.

## Fix

Import `platform_auth.auth_headers` lazily (same pattern as the
activity-log path already uses), attach `headers=_auth()` to both
httpx call sites. Matches the #225 fix for the register call.

## Not in this PR

- awareness_client.py also makes HTTP calls to a separate AWARENESS_URL
  service (not the platform), which may or may not need the same fix
  depending on that service's auth posture. Out of scope for this PR.

- TR's specific token problem: TR's `/configs/.auth_token` file is
  empty because it was re-provisioned via `apply_template: true`
  (recovery path from the failed-volume incident) and Phase 30.1
  only mints a token on FIRST register per workspace. This fix
  doesn't help TR until TR gets a fresh token — tracked separately.

## Test plan

- [x] Python syntax check on memory.py passes
- [ ] CI: all memory-related tests should still pass (the new code
      paths only add header passing, no shape change)
- [ ] Real-world verification: after TR gets a fresh token, idle-loop
      pilot should produce a dispatch within 10 min (seeded backlog
      already in place from this session)

## Related
- #215 / #225 — register call auth_headers fix (same pattern)
- #216 — TR idle-loop pilot (couldn't measure until this lands)
- #166 / #167 — platform AdminAuth wave that surfaced this gap
2026-04-15 17:26:26 -07:00
rabbitblood
baffc6b0c3 feat(hermes): Phase 2d-i — system-prompt.md injection on all 3 dispatch paths
The Hermes adapter never read /configs/system-prompt.md. Any role that
switched to runtime: hermes was silently losing its role identity because
the system prompt wasn't passed to the model. This PR fixes that by:

1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg
2. `create_executor(config_path=...)` forwards to the constructor
3. `adapter.py` passes `config.config_path` through from AdapterConfig
4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt
   (hot-reload-capable — reads on every turn, not just at startup)
5. `_do_inference(user_message, history, system_prompt)` — new arg threads
   through the dispatch to each native path
6. Each path uses the provider's NATIVE system field:
   - OpenAI-compat: prepends `{"role":"system", "content":...}` to messages
   - Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic
     requires system at the top level)
   - Gemini: `config=GenerateContentConfig(system_instruction=...)`

## Phase scoreboard
- 2a (in main) — native Anthropic dispatch infra
- 2b (in main) — native Gemini dispatch
- 2c (in main) — multi-turn history on all paths
- **2d-i (this PR)** — system prompts on all paths
- 2d-ii (future) — tool calling on native paths
- 2d-iii (future) — vision content blocks on native paths
- 2d-iv (future) — streaming

## Test coverage

46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry):

- Existing dispatch tests updated to assert the 3-arg call shape
  `("hello", None, None)` — history + system_prompt both None
- 4 new tests:
  - `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows
  - `dispatch_passes_system_prompt_to_gemini` — happy path
  - `dispatch_passes_system_prompt_to_openai` — happy path
  - `executor_accepts_config_path_kwarg` — constructor stores config_path
  - `create_executor_forwards_config_path` — both back-compat and registry
    resolution paths forward config_path through to the executor

## Back-compat

- `config_path=None` (default) → execute() skips system-prompt injection,
  same behavior as pre-2d-i
- Workspaces with `runtime: hermes` but no `/configs/system-prompt.md`
  file get `system_prompt=None` (get_system_prompt returns fallback),
  same as before
- The 13 OpenAI-compat providers work identically — system_prompt just
  adds a leading message, which every OpenAI-compat endpoint already
  supports
- Anthropic + Gemini previously got zero system context; now they get
  the same system prompt the workspace's system-prompt.md carries

## Why this matters

Before this PR: if someone flipped a workspace from `runtime: claude-code`
to `runtime: hermes`, the agent would act generically (no role identity,
no project conventions, no CLAUDE.md context) because the Hermes executor
never looked at system-prompt.md. That's a silent correctness regression
the test suite wouldn't catch because none of our live workspaces use
the hermes runtime today.

With this PR: Hermes workspaces get the same system prompt injection as
Claude-code workspaces, making the `runtime: hermes` switch a true drop-in
alternative.

## Related
- #267 Phase 2c (multi-turn history — in main)
- #255 Phase 2b (gemini native — in main)
- #240 Phase 2a (anthropic native — in main)
- #208 Phase 1 (provider registry — in main)
- project_hermes_multi_provider.md — Phase 2d-i was the next queued item
2026-04-15 16:21:47 -07:00
airenostars
1f22d7df1b feat: GET /workspaces/:id/transcript — live agent session log
Closes #N (issue to be filed)

Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
  - doesn't work for remote workspaces (Phase 30 / Fly Machines)
  - requires shell access on the host
  - has no pagination

This PR adds:

1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
   `{runtime, supported, lines, cursor, more, source}`. Default returns
   `supported: false` so non-claude-code runtimes pass through gracefully.

2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
   recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
   cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
   project dir name matches what Claude Code actually writes to. Limit
   capped at 1000 to prevent OOM.

3. Workspace HTTP route `GET /transcript` — Starlette handler added
   alongside the A2A app. Trusts the internal Docker network (same
   model as POST / for A2A); Phase 30 remote-workspace auth is a
   follow-up.

4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
   workspace's URL, forwards GET, caps response at 1MB. Gated by
   existing `WorkspaceAuth` middleware (same as /traces, /memories,
   /delegations).

Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.

Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.

## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
2026-04-15 14:29:43 -07:00
rabbitblood
cb3c7dcf91 feat(hermes): Phase 2c — multi-turn history passed natively to all paths
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).

Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.

## What's new

- **`_history_to_openai_messages(user_message, history)`** (static) — maps
  A2A `(role, text)` tuples to OpenAI Chat Completions
  `[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
  `ai`→`assistant`. Current turn appended as the final user message.

- **`_history_to_anthropic_messages`** (static) — identical wire shape to
  OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
  blocks will diverge here.

- **`_history_to_gemini_contents`** (static) — Gemini uses a different
  shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
  `parts=[{"text":...}]`. Delegates to none of the others.

- **`_do_openai_compat(user_message, history=None)`** — accepts history,
  builds messages via `_history_to_openai_messages`. Back-compat: pass
  `history=None` to get the old single-turn behavior.

- **`_do_anthropic_native(user_message, history=None)`** — same signature
  change, calls `_history_to_anthropic_messages`. Still uses
  `anthropic.AsyncAnthropic().messages.create()`, just with proper
  multi-turn.

- **`_do_gemini_native(user_message, history=None)`** — same pattern,
  calls `_history_to_gemini_contents`, passes to Gemini's
  `generate_content(contents=...)`.

- **`_do_inference(user_message, history=None)`** — new signature,
  dispatches by auth_scheme as before, passes both args through.

- **`execute()`** — no longer calls `build_task_text`. Calls
  `extract_history(context)` directly and forwards to `_do_inference`.
  Removes the `build_task_text` import (not needed in this file anymore).

## Tests

Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.

5 NEW tests:

- `test_history_to_openai_messages_empty_history` — empty history degrades
  to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
  history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
  anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
  verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
  forwards history to the chosen provider path

All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):

    pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
    41 passed in 0.07s

## Back-compat

- No public API changes to `create_executor()`. Callers that hit
  `execute()` via A2A get the new multi-turn behavior automatically via
  `extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
  single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
  adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
  bypasses it now.

## What's NOT in this PR (Phase 2d)

- Tool calling / function calling on native paths (anthropic `tools=`,
  gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
  {type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
  `system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
  thinking config

Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.

## Related

- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
2026-04-15 14:21:10 -07:00
Hongming Wang
3828693897
Merge pull request #255 from Molecule-AI/feat/hermes-phase2b-gemini-native
feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path
2026-04-15 14:01:00 -07:00
Hongming Wang
df4740bf26
Merge pull request #240 from Molecule-AI/feat/hermes-phase2-native-sdks
feat(hermes): Phase 2a — native Anthropic Messages API dispatch (auth_scheme='anthropic')
2026-04-15 14:00:51 -07:00
Hongming Wang
1d9ddb8c67 fix(tests): hermes provider env-var leak broke test_hermes_smoke
Pre-existing flaky test: when the full workspace-template suite ran in
collection order, test_hermes_smoke.py::test_create_executor_raises_
without_keys failed with "DID NOT RAISE ValueError". Failure only
surfaced when test_hermes_providers ran first.

Root cause: test_hermes_providers had an autouse fixture that used
monkeypatch.delenv on entry, but several tests in that file mutate
os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`),
bypassing monkeypatch. monkeypatch only tracks its own deltas, so on
fixture teardown the direct-mutation values stayed in os.environ.
HERMES_API_KEY leaked across file boundaries into test_hermes_smoke,
which then saw a key present when it expected absence.

Fix: replace monkeypatch-based fixture with pure snapshot/restore:
- Snapshot all provider env vars at entry
- Clear them
- yield (test runs, may mutate freely)
- try/finally restore the exact pre-test state

This is deterministic regardless of whether a test uses monkeypatch,
direct mutation, or neither. Also adds a comment documenting WHY we
switched away from monkeypatch so a future reviewer doesn't revert.

Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:59:48 -07:00
rabbitblood
adcaa69e42 feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.

## What's new in this PR

1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
   update `base_url` from the OpenAI-compat endpoint
   (`/v1beta/openai`) to the bare host
   (`https://generativelanguage.googleapis.com`) which the native SDK
   uses.

2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
   `google.genai.Client().aio.models.generate_content(...)`. Dispatch
   table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
   Same fail-loud semantics as `_do_anthropic_native` — missing SDK
   raises a clear RuntimeError with install instructions.

3. `requirements.txt`: add `google-genai>=1.0.0`.

4. `test_hermes_phase2_dispatch.py`: +3 tests
   - `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
     validated
   - `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
     gemini native, not openai-compat or anthropic-native
   - `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
     on missing `google-genai` package
   Plus updated existing dispatch tests to mock `_do_gemini_native`
   alongside the other paths so "no cross-calls" assertions stay tight.

All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):

    pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
    36 passed in 0.07s

## Dispatch table after this PR

    auth_scheme="openai"     → _do_openai_compat (13 providers)
    auth_scheme="anthropic"  → _do_anthropic_native (1 provider, Phase 2a)
    auth_scheme="gemini"     → _do_gemini_native (1 provider, Phase 2b) ← NEW
    <unknown>                → _do_openai_compat + warning (forward-compat)

## Back-compat

- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
  instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
  automatically — no caller changes needed

## What's NOT in this PR (future phases)

- Streaming support (`astream_messages` / `streamGenerateContent` stream
  variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
  gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path

Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.

## Stacking

This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close #240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.

## Related

- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
  item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
  eco-watch entry that catalogued Hermes's native provider list and
  shaped the original Phase 2 scope
2026-04-15 13:20:39 -07:00
rabbitblood
3dd8df585e feat(hermes): Phase 2a — native Anthropic Messages API dispatch path
Completes the Hermes adapter's native-SDK plan for the provider that gains
the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for
plain text turns on every provider (Phase 1 covered that with one code path
for all 15 providers), but Anthropic's Messages API has first-class tool use,
vision content blocks, and extended thinking that the OpenAI-compat shim
strips or mis-translates.

Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future),
this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a
production measurement window on Phase 2a.

## Design

Providers now dispatch by `auth_scheme` field. Phase 1 added the field but
every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"`
and wires a second inference path keyed on that:

- `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles
  14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen,
  GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral)
- `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the
  official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)`
- `HermesA2AExecutor._do_inference(task_text)` — dispatches by
  `self.provider_cfg.auth_scheme`

Unknown schemes fall back to OpenAI-compat with a logged warning, so future
provider additions don't crash if a native SDK path ships late.

## Fail-loud on missing SDK

`_do_anthropic_native` raises a clear `RuntimeError` with install
instructions if the `anthropic` package is missing at runtime:

    Hermes anthropic native path requires the `anthropic` package. Install
    in the workspace image with `pip install anthropic>=0.39.0` or set
    HERMES provider=openrouter to route Claude models through OpenRouter's
    OpenAI-compat shim instead.

This is intentional: silent fallback would mask fidelity loss (tool_use
blocks become plain text, vision gets stripped). Loud failure is better.

`requirements.txt` adds `anthropic>=0.39.0` so the package is baked into
the workspace-template image build path. Operators building custom workspace
images without anthropic installed get the loud error.

## Back-compat

- `create_executor(hermes_api_key="x")` → still routes to Nous Portal
  (`auth_scheme="openai"`), unchanged
- `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER
- `OPENROUTER_API_KEY` env var → still second
- All 14 OpenAI-compat providers unchanged — they take the same code path
  as before
- ONLY `anthropic` provider changes behavior: it now uses the native
  Messages API instead of the `/v1/chat/completions` compat shim

## Constructor signature change

`HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig`
instead of separate `api_key + base_url + model`. The three fields are
derived from `provider_cfg` + an optional model override. This is a
breaking change for any external caller building an executor directly,
but the only documented public entry point is `create_executor()`, which
is updated in the same commit to pass the cfg through.

## Test coverage

`workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests:

1. `test_anthropic_entry_has_anthropic_scheme` — registry flip
2. `test_all_other_providers_still_openai_scheme` — regression guard
3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path
4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path
5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat
6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud
7. `test_create_executor_passes_provider_cfg` — constructor wiring

All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s).
Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no
regressions.

## What's NOT in this PR (Phase 2b)

- Gemini native `generateContent` path (`auth_scheme="gemini"`)
- Streaming support across both native paths (`astream_messages`, `streamGenerateContent`)
- Tool calling on the anthropic native path (the `tools` + `tool_use` blocks)
- Vision content blocks (image_url → anthropic image blocks)
- Extended thinking parameter passthrough

All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum
viable native Anthropic dispatch — single-turn text in, text out, no tools.

## Related

- Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path
- Queued memory: `project_hermes_multi_provider.md` — full phased plan
- Triggering directive: CEO 2026-04-15 — "once current works are cleared,
  focus on supporting hermes agent"
2026-04-15 12:23:56 -07:00
Hongming Wang
279f5fd672 fix(workspace-template): #220 — send auth_headers() on initial_prompt + idle loop
Closes #220. #215 added auth_headers() to /registry/register but missed
two other self-post paths from the same workspace container:

1. initial_prompt (_do_send_sync) — fires once on first boot after the
   A2A server is ready. Posts to /workspaces/:id/a2a via the platform
   proxy. Missing headers meant the initial prompt got silently
   dropped as 401 on any token-enrolled workspace.

2. idle loop (_post_sync) — fires every idle_interval_seconds while
   the workspace has no active task (#205 pattern). Same proxy path,
   same missing headers, same silent 401 in multi-tenant mode.

Both now build headers as
  {"Content-Type": "application/json", **auth_headers()}

auth_headers() returns {"Authorization": "Bearer <token>"} when
/auth-token.txt exists, empty dict otherwise (first boot before
register issues the token). The existing lazy-bootstrap fail-open
on the platform side covers the empty-dict case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:02:01 -07:00
Hongming Wang
dacd78b8f9
Merge pull request #231 from Molecule-AI/fix/160-sdk-error-probe
fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr
2026-04-15 11:58:59 -07:00
Hongming Wang
626fb3e803 Merge branch 'main' into fix/160-sdk-error-probe 2026-04-15 11:54:13 -07:00