PR #763 (feat/issue-733-agents-md-impl) branched before PR #743 landed the
claude-opus-4-7 model default upgrade. config.py still had the old
claude-sonnet-4-6 default, which would have silently regressed the upgrade.
Restore both occurrences:
- WorkspaceConfig.model default: claude-sonnet-4-6 → claude-opus-4-7
- load_config() fallback: claude-sonnet-4-6 → claude-opus-4-7
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Turns the QA TDD spec from PR #755 GREEN: all 14 tests pass.
Changes:
- workspace-template/agents_md.py (new): generate_agents_md(config_dir, output_path)
Writes AAIF-compliant AGENTS.md with name, role, description, A2A endpoint,
and MCP tools sections. AGENT_URL env var overrides the derived localhost URL.
Falls back to description when role is absent (graceful legacy compat).
Always overwrites — no stale-file guard.
- workspace-template/config.py: add role field to WorkspaceConfig
New top-level field `role: str = ""` with load_config support.
Falls back to description in agents_md.py for backward compat.
- workspace-template/main.py: wire generate_agents_md into startup (step 1a)
Fires after load_config + preflight. Non-fatal: exception is caught and
printed as a warning so a bad /workspace mount never kills the agent.
- workspace-template/tests/test_agents_md.py (new): pulled from PR #755 branch
Test results:
pytest tests/test_agents_md.py -v → 14 passed (was: 14 RED / import error)
pytest (full suite) → 1044 passed, 2 xfailed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the anthropic:claude-sonnet-4-6 default across config, handlers,
env example, and litellm proxy config. All tests updated to match the new
default; sonnet-4-6 alias kept in litellm_config.yml for pinned workspaces.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds _load_config_dict() helper to ClaudeSDKExecutor and wires the new
effort and task_budget config fields into _build_options() before the
Anthropic API call:
- effort (str): low|medium|high|xhigh|max — populates output_config.effort
- task_budget (int): advisory total-token budget; must be >= 20000 when set;
automatically adds task-budgets-2026-03-13 beta header
Also adds WorkspaceConfig.effort and WorkspaceConfig.task_budget fields in
config.py and 5 acceptance tests covering all code paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Today's multi-framework research (Hermes, Letta, Trigger.dev, Inngest, AG2,
Rivet, n8n, Composio, SWE-agent — see docs/ecosystem-watch.md) confirmed
that nobody runs while(true) per agent. The working patterns are:
(a) event-driven + hibernation (Hermes, Letta, Trigger.dev, Inngest)
(b) cron/user-triggered ephemeral runs (AG2, Rivet, n8n, SWE-agent)
Molecule AI is currently 100% in category (b). Observed team utilization:
~0.5% — agents idle 99.5% of the time because cron fires and CEO-typed
A2A are the only initiating signals. CEO's north-star is 24/7 iteration,
current cadence falls short.
This PR closes the gap by adding an in-workspace idle loop that wakes the
agent periodically ONLY when it has no active task. The shape is the
Hermes reflection-on-completion pattern combined with the Letta backlog-pull
pattern, collapsed into a ~60 LOC change in the workspace-template. Zero
new Go code. Zero new DB tables. Zero new API endpoints.
## How it works
1. `config.py` gets two new fields on WorkspaceConfig:
- `idle_prompt: str = ""` — the prompt to self-send when idle
- `idle_interval_seconds: int = 600` — how often to check (default 10 min)
Both support inline or file ref (matching the initial_prompt pattern).
2. `main.py` spawns an `_run_idle_loop()` asyncio task alongside the
existing initial_prompt task (same lifecycle hooks — cancelled in the
`finally:` of the server.serve() block).
3. The loop body:
a. Sleep interval
b. Check `heartbeat.active_tasks == 0` LOCALLY (no LLM call, no HTTP)
c. If idle → self-POST the idle_prompt via the existing /workspaces/{id}/a2a proxy
d. Loop
The agent's own concurrency control rejects the post if it becomes busy
between the check and the POST — that's the safety valve.
4. Gated on `config.idle_prompt` being non-empty. Default = "" = no loop.
Existing workspaces upgrade silently as no-ops until someone explicitly
opts in by setting idle_prompt in org.yaml (either defaults: or
per-workspace:).
## Cost analysis (from the research report)
- while(true) pattern: ~$93/day/org (12 agents × 12 thinks/hour × $0.027). Unshippable.
- Hermes reflection-on-completion: ~$0.45/day/org. Cost ∝ useful work.
- This PR's idle loop at 10-min cadence: upper bound 12 × 6/hour × 24h
× ~3k tokens × Sonnet rate ≈ $5/day/org PER ROLE, only if they're
genuinely idle every check. In practice far less because busy periods
skip the LLM call entirely (the active_tasks check is local).
## Rollout plan
Research report recommended rolling to ONE workspace first (Technical
Researcher) and measuring 24h of activity_logs before enabling for
all 12. This PR enables the mechanism; it does NOT add any default
idle_prompt to org-templates/molecule-dev/org.yaml. That's a follow-up
PR after this one lands and one workspace has been manually opted in
for measurement.
## Not touched in this PR
- No Go code (no new platform endpoint, no new DB columns)
- No org.yaml changes (zero-impact until someone opts in)
- No scheduler changes (the idle loop is a workspace concern, not a
scheduler concern — matches the research report's layering)
## Test plan
- [x] Python syntax check (ast.parse) on main.py + config.py
- [ ] Unit test: WorkspaceConfig parses idle_prompt / idle_interval_seconds from yaml
- [ ] Integration test: set idle_prompt on Technical Researcher, measure that
an A2A message is received every ~10 min while idle, and NOT received
while busy with a delegation
- [ ] Dogfood: enable on Technical Researcher for 24h, count activity_logs
delta vs baseline, confirm cost stays within model
## Related
- Today's research report (conversation output, summarized in commit trailer)
- docs/ecosystem-watch.md → `### Hermes Agent` (the canonical reflection-on-completion example)
- #159 orchestrator/worker split — complementary: leaders pulse for dispatch,
workers idle-loop for pull. Together: leaders push work, workers pull work,
no role ever sits idle with a cold queue.