molecule-core/workspace-template
rabbitblood 0f2ed6bf0a fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr
Context: when the claude-agent-sdk wraps a stream error from the CLI
subprocess that it can't categorize (rate limit, auth, network), it
raises a bare `Exception("Command failed with exit code 1\nError output:
Check stderr output for details")`. The exception has no `.stderr` or
`.exit_code` attributes, so #66's `_format_process_error` — which reads
those attributes — has nothing to surface. The log line becomes:

    SDK agent error [claude-code]: Exception: Command failed with exit
    code 1 (exit code: 1)\nError output: Check stderr output for details

That's the placeholder text from the SDK's error path, not the actual
error. Operators chasing a stuck workspace are forced to `docker exec
ws-xxx claude --print` manually to discover the real cause. Observed
today during the rate-limit incident: every PM error line was identical
"Check stderr output for details" while the real cause ("You've hit
your limit · resets Apr 17, 11pm (UTC)") was only visible via manual
reproduction — that cost ~20 minutes of diagnosis time.

## Fix

Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs
`claude --print` with a small probe input, captures stderr+stdout, and
returns the real error string. Bounded by 30s timeout so a hung CLI
can't stall the error path.

Extend `_format_process_error` with ONE narrow fallback: if the
exception has no stderr/exit_code AND its message contains the specific
"Check stderr output for details" marker, call the probe and append
`probed_cli_error=<real error>` to the formatted line.

Critically: the probe only runs in the narrow case where we have
nothing else to log. If `.stderr` or `.exit_code` are present (the
normal ProcessError path from #66), the probe is skipped — no wasted
subprocess, no 30s latency on every error.

## Test coverage

`workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests:
- `test_format_process_error_probes_cli_when_stderr_swallowed` — the
  happy path: exception matches the marker, probe runs, result appears
  in the formatted line. Probe is monkeypatched so no subprocess spawns
  in the test.
- `test_format_process_error_does_not_probe_when_stderr_already_present` —
  negative: regular ProcessError with `.stderr` set does NOT trigger
  the probe (skip the wasted call).
- `test_format_process_error_does_not_probe_without_swallowed_marker` —
  negative: unrelated plain exceptions (e.g. RuntimeError) do NOT
  trigger the probe (so the common-case error path stays fast).

All 7 `_format_process_error` tests pass locally (4 existing + 3 new):
\`\`\`
pytest tests/test_claude_sdk_executor.py -k format_process_error
======================= 7 passed in 0.06s ========================
\`\`\`

## Impact

Next time the SDK swallows a real error (rate limit, auth failure,
network outage), the workspace log will contain the actual error string
alongside the generic placeholder:

    SDK agent error [claude-code]: Exception: Command failed with exit
    code 1 ... | probed_cli_error="You've hit your limit · resets Apr
    17, 11pm (UTC)"

Diagnosis time drops from "docker exec each ws, run claude --print,
read stderr" (~20 min) to "grep probed_cli_error in platform logs"
(~10 seconds).

Closes #160.
2026-04-15 11:50:55 -07:00
..
adapters feat(hermes): Phase 1 — multi-provider registry (15 providers, back-compat preserved) 2026-04-15 11:14:35 -07:00
builtin_tools fix(security): H3 github_pat_ redaction + M4 atomic token write (audit cycle 10) 2026-04-14 09:34:27 +00:00
plugins_registry feat(plugins): split guardrails into 12 modular plugins 2026-04-14 12:20:04 -07:00
policies initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
skill_loader fix(security): H1 — replace MD5 with SHA-256 in config/skill watchers 2026-04-14 07:52:07 +00:00
tests fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr 2026-04-15 11:50:55 -07:00
a2a_cli.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
a2a_client.py fix(security): complete Phase 30.6 auth headers in a2a_client get_peers and discover_peer 2026-04-14 13:23:44 +00:00
a2a_executor.py fix(a2a): cancel() event, stateTransitionHistory capability, wire push store (#173 #174 #175) 2026-04-15 17:58:10 +00:00
a2a_mcp_server.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
a2a_tools.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
agent.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
build-all.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
claude_sdk_executor.py fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr 2026-04-15 11:50:55 -07:00
cli_executor.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
config.py feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape) 2026-04-15 11:09:43 -07:00
consolidation.py fix(security): N1 — add auth headers to all platform calls in Python callers 2026-04-14 08:37:50 +00:00
coordinator.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
Dockerfile initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
entrypoint.sh fix(workspace): recursive chown when /workspace bind mount is root-owned (#13) 2026-04-14 07:29:30 -07:00
events.py fix(security): Cycle 5 — auth middleware, injection hardening, skill sandbox 2026-04-14 04:44:42 +00:00
executor_helpers.py fix(security): Cycle 5 — auth middleware, injection hardening, skill sandbox 2026-04-14 04:44:42 +00:00
heartbeat.py fix(security): N1 — add auth headers to all platform calls in Python callers 2026-04-14 08:37:50 +00:00
initial_prompt.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
main.py Merge remote-tracking branch 'origin/main' into feat/workspace-idle-loop 2026-04-15 11:21:15 -07:00
molecule_ai_status.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
platform_auth.py fix(security): H3 github_pat_ redaction + M4 atomic token write (audit cycle 10) 2026-04-14 09:34:27 +00:00
plugins.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
preflight.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
prompt.py initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
pytest.ini initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
requirements.txt initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
watcher.py fix(security): H1 — replace MD5 with SHA-256 in config/skill watchers 2026-04-14 07:52:07 +00:00