hermes-agent/tests
Teknium 285bb2b915
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.

Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:

  project (new default):
    - cwd       = session's TERMINAL_CWD (falls back to os.getcwd())
    - python    = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
                  with a Python 3.8+ version check; falls back cleanly to
                  sys.executable if no venv or the candidate fails
    - result    : 'import pandas' works, '.env' resolves, matches terminal()

  strict (opt-in):
    - cwd       = staging tmpdir (today's behavior)
    - python    = sys.executable (today's behavior)
    - result    : maximum reproducibility and isolation; project deps
                  won't resolve

Security-critical invariants are identical across both modes and covered by
explicit regression tests:

  - env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
    *_CREDENTIAL, *_PASSWD, *_AUTH substrings)
  - SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
    delegate_task, no MCP from inside scripts)
  - resource caps (5-min timeout, 50KB stdout, 50 tool calls)

Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).

Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
..
acp fix(acp): improve zed integration 2026-04-17 13:29:26 -07:00
agent test: update stale tests to match current code (#11963) 2026-04-17 21:35:30 -07:00
cli test: update stale tests to match current code (#11963) 2026-04-17 21:35:30 -07:00
cron feat(cron+tests): extend origin fallback to email/dingtalk/qqbot + fix Weixin test mocks 2026-04-17 06:26:43 -07:00
e2e
environments/benchmarks
fakes
gateway test: update stale tests to match current code (#11963) 2026-04-17 21:35:30 -07:00
hermes_cli feat(execute_code): add project/strict execution modes, default to project (#11971) 2026-04-18 01:46:25 -07:00
honcho_plugin
integration
plugins test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
run_agent fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907) 2026-04-17 20:39:25 -07:00
skills
tools feat(execute_code): add project/strict execution modes, default to project (#11971) 2026-04-18 01:46:25 -07:00
tui_gateway
__init__.py
conftest.py Support browser CDP URL from config 2026-04-17 16:05:04 -07:00
run_interrupt_test.py
test_batch_runner_checkpoint.py
test_cli_file_drop.py
test_cli_skin_integration.py
test_ctx_halving_fix.py
test_empty_model_fallback.py
test_evidence_store.py
test_hermes_constants.py
test_hermes_logging.py
test_hermes_state.py
test_honcho_client_config.py
test_ipv4_preference.py
test_mcp_serve.py
test_mini_swe_runner.py fix(kimi): cover remaining fixed-temperature bypasses 2026-04-17 20:25:42 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py
test_model_tools_async_bridge.py
test_model_tools.py
test_ollama_num_ctx.py
test_packaging_metadata.py
test_plugin_skills.py
test_project_metadata.py build(deps): add qrcode to dingtalk + feishu extras (parity with messaging) (#11627) 2026-04-17 13:31:53 -07:00
test_retry_utils.py
test_sql_injection.py
test_subprocess_home_isolation.py
test_timezone.py test: speed up slow tests (backoff + subprocess + IMDS network) (#11797) 2026-04-17 14:21:22 -07:00
test_toolset_distributions.py
test_toolsets.py
test_trajectory_compressor_async.py fix(kimi): cover remaining fixed-temperature bypasses 2026-04-17 20:25:42 -07:00
test_trajectory_compressor.py fix(kimi): cover remaining fixed-temperature bypasses 2026-04-17 20:25:42 -07:00
test_tui_gateway_server.py fix(tui): first-run setup preflight + actionable no-provider panel 2026-04-17 10:58:01 -05:00
test_utils_truthy_values.py