Adds _load_config_dict() helper to ClaudeSDKExecutor and wires the new
effort and task_budget config fields into _build_options() before the
Anthropic API call:
- effort (str): low|medium|high|xhigh|max — populates output_config.effort
- task_budget (int): advisory total-token budget; must be >= 20000 when set;
automatically adds task-budgets-2026-03-13 beta header
Also adds WorkspaceConfig.effort and WorkspaceConfig.task_budget fields in
config.py and 5 acceptance tests covering all code paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Context: when the claude-agent-sdk wraps a stream error from the CLI
subprocess that it can't categorize (rate limit, auth, network), it
raises a bare `Exception("Command failed with exit code 1\nError output:
Check stderr output for details")`. The exception has no `.stderr` or
`.exit_code` attributes, so #66's `_format_process_error` — which reads
those attributes — has nothing to surface. The log line becomes:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 (exit code: 1)\nError output: Check stderr output for details
That's the placeholder text from the SDK's error path, not the actual
error. Operators chasing a stuck workspace are forced to `docker exec
ws-xxx claude --print` manually to discover the real cause. Observed
today during the rate-limit incident: every PM error line was identical
"Check stderr output for details" while the real cause ("You've hit
your limit · resets Apr 17, 11pm (UTC)") was only visible via manual
reproduction — that cost ~20 minutes of diagnosis time.
## Fix
Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs
`claude --print` with a small probe input, captures stderr+stdout, and
returns the real error string. Bounded by 30s timeout so a hung CLI
can't stall the error path.
Extend `_format_process_error` with ONE narrow fallback: if the
exception has no stderr/exit_code AND its message contains the specific
"Check stderr output for details" marker, call the probe and append
`probed_cli_error=<real error>` to the formatted line.
Critically: the probe only runs in the narrow case where we have
nothing else to log. If `.stderr` or `.exit_code` are present (the
normal ProcessError path from #66), the probe is skipped — no wasted
subprocess, no 30s latency on every error.
## Test coverage
`workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests:
- `test_format_process_error_probes_cli_when_stderr_swallowed` — the
happy path: exception matches the marker, probe runs, result appears
in the formatted line. Probe is monkeypatched so no subprocess spawns
in the test.
- `test_format_process_error_does_not_probe_when_stderr_already_present` —
negative: regular ProcessError with `.stderr` set does NOT trigger
the probe (skip the wasted call).
- `test_format_process_error_does_not_probe_without_swallowed_marker` —
negative: unrelated plain exceptions (e.g. RuntimeError) do NOT
trigger the probe (so the common-case error path stays fast).
All 7 `_format_process_error` tests pass locally (4 existing + 3 new):
\`\`\`
pytest tests/test_claude_sdk_executor.py -k format_process_error
======================= 7 passed in 0.06s ========================
\`\`\`
## Impact
Next time the SDK swallows a real error (rate limit, auth failure,
network outage), the workspace log will contain the actual error string
alongside the generic placeholder:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 ... | probed_cli_error="You've hit your limit · resets Apr
17, 11pm (UTC)"
Diagnosis time drops from "docker exec each ws, run claude --print,
read stderr" (~20 min) to "grep probed_cli_error in platform logs"
(~10 seconds).
Closes#160.