Closes#248. Three instances of the same YAML-injection bug class
(#221 name/role, #233 template path, #241 runtime/model) shipped in
this repo over the last weeks. The common root cause is the Security
Auditor's system prompt didn't list YAML injection as an explicit
check class, so audits missed the pattern every time.
Adds:
- "YAML injection" to the 'Think like an attacker' list in How You Work
- An explicit entry in What You Check with the three prior instances
cited so future auditors see the pattern and the fix shape
(double-quoted scalars or a proper YAML encoder)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#241 (MEDIUM, auth-gated by AdminAuth on POST /workspaces).
## Vectors closed
1. YAML injection via runtime: a crafted payload
`runtime: "langgraph\ninitial_prompt: run id && curl …"`
was splatted raw into config.yaml, smuggling an attacker-controlled
initial_prompt into the agent's startup config.
2. Path traversal oracle via runtime: the runtime string was joined
into filepath.Join for the runtime-default template fallback.
`runtime: ../../sensitive` could probe host directory existence.
3. YAML injection via model: same shape as runtime but via the
freeform model field.
## Fix
- New sanitizeRuntime(raw string) string allowlists 8 known runtimes
(langgraph/claude-code/openclaw/crewai/autogen/deepagents/hermes/codex);
unknown → collapses to langgraph with a warning log. Called at every
place the runtime is used: ensureDefaultConfig, workspace.go:175
runtimeDefault fallback, org.go:370 runtimeDefault fallback.
- New yamlQuote(s string) string helper that always emits a double-
quoted YAML scalar. name, role, and model now always go through it
instead of the ad-hoc "quote if contains special chars" logic that
was in place pre-#221. Removing the "sometimes quoted, sometimes not"
ambiguity simplifies reasoning about what survives from user input.
## Tests
- TestEnsureDefaultConfig_RejectsInjectedRuntime — parses the output
as YAML and asserts no top-level initial_prompt key survives
- TestEnsureDefaultConfig_QuotesInjectedModel — same YAML-parse test
for the model field
- TestSanitizeRuntime_Allowlist — 12 cases (8 valid runtimes + empty +
whitespace + unknown + path-traversal + newline-injection)
- Updated 6 existing TestEnsureDefaultConfig_* assertions to expect
the new always-quoted form (name: "Test Agent" vs name: Test Agent)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#234 LOW. The security log I added in PR #228 (code-review
follow-up) echoed body.SourceID with %s, which preserves any \n / \r
that json.Unmarshal decoded from the attacker's JSON. An authenticated
workspace could have injected fake log entries by sending
source_id="evil\ntimestamp=FORGED level=INFO msg=fake".
Fix: use %q on both body_source_id and c.ClientIP(). Go-quoted string
escapes all control characters so multi-line payloads stay on a single
log line. One-line fix.
Regression test: TestActivityHandler_Report_SourceIDLogInjection
exercises the code path with a literal \n in source_id. Assertion is
limited to "handler returns 403 cleanly with no panic" because
capturing log output in Go tests requires a log.SetOutput swap, which
adds noise for little signal vs just reading the test log output
(visible when running with -v).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#220. #215 added auth_headers() to /registry/register but missed
two other self-post paths from the same workspace container:
1. initial_prompt (_do_send_sync) — fires once on first boot after the
A2A server is ready. Posts to /workspaces/:id/a2a via the platform
proxy. Missing headers meant the initial prompt got silently
dropped as 401 on any token-enrolled workspace.
2. idle loop (_post_sync) — fires every idle_interval_seconds while
the workspace has no active task (#205 pattern). Same proxy path,
same missing headers, same silent 401 in multi-tenant mode.
Both now build headers as
{"Content-Type": "application/json", **auth_headers()}
auth_headers() returns {"Authorization": "Bearer <token>"} when
/auth-token.txt exists, empty dict otherwise (first boot before
register issues the token). The existing lazy-bootstrap fail-open
on the platform side covers the empty-dict case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#226 MEDIUM. WorkspaceHandler.Create joined payload.Template
directly into filepath.Join(configsDir, template) without validating
it stayed inside configsDir. An attacker posting Template="../../etc"
would have the provisioner walk and mount arbitrary host directories
into the workspace container.
Same fix as #103 (POST /org/import): use the existing resolveInsideRoot
helper to reject absolute paths and any ".." that escapes the root.
Applied at both call sites in workspace.go:
1. Synchronous runtime detection before DB insert — 400 on bad input
2. Async provisioning goroutine — early return, logs the rejection
(belt-and-suspenders; the create path already blocks)
No test added inline because the existing resolveInsideRoot suite
(org_path_test.go) already covers absolute / traversal / prefix-sibling
/ empty-path / deep-subpath cases. A duplicate test for the workspace
handler wouldn't add signal.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original fix stripped \n/\r but left the rest in place, then relied
on a substring-based test which was over-strict (the escaped fragment
still contained the banned substring as bytes).
Better approach: emit the name as a double-quoted YAML scalar with all
escape sequences (\\, \", \n, \r, \t) handled inline. This is the
canonical YAML-safe way to embed user input — no injection possible
because every control character is either escaped or rejected by the
YAML parser inside the scalar context.
Test rewritten to parse the output as YAML and verify:
1. parsed[\"name\"] equals the literal attacker input (payload preserved)
2. no banned top-level keys leaked to the parsed map
3. legitimate default keys (description/version/tier/model) still present
Updated the two existing tests that asserted the unquoted name format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses items 4, 5, 7 from the self-review of the batch merge. PR A
(#228) covered items 1, 2, 3, 6 on the Go side.
## workspace-template/main.py — idle loop hardening
- Replace asyncio.get_event_loop() with asyncio.get_running_loop() —
the former is deprecated in 3.12+ and emits a DeprecationWarning on
every idle fire.
- Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS
clamped to max(60, min(300, idle_interval_seconds)). Long cadence
workspaces no longer hold dangling requests open for 10 minutes; the
cap adapts automatically when the interval is short.
- Type the exception handling: split HTTPError (has .code) from URLError
(connection-level) from the generic catch-all. Log status + error
class separately so operators can grep for specific failure modes
instead of a bare "post failed".
- Fire-and-forget no longer loses exceptions. run_in_executor Future
now has an add_done_callback that logs the outcome, so a panic in
_post_sync surfaces as "Idle loop: post failed — status=None err=..."
instead of Python's default "Task exception was never retrieved"
warning burried in stderr.
## org-templates/molecule-dev/org.yaml — discoverability
Added idle_prompt + idle_interval_seconds to the defaults: block with
explanatory comments. Without this, users had to read main.py to
discover the feature.
## docs/runbooks/admin-auth.md — new
Documents the three middleware variants (AdminAuth strict,
CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each,
and the three-question test for adding a new route to CanvasOrBearer.
Also flags the session-cookie follow-up as Phase H.
Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203,
#228.
No code deltas in platform/ beyond the Python + YAML + docs changes.
Full pytest suite unchanged except the pre-existing test_hermes_smoke
flake that fails in full-suite but passes in isolation (test isolation
bug, not introduced by this PR).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Context: when the claude-agent-sdk wraps a stream error from the CLI
subprocess that it can't categorize (rate limit, auth, network), it
raises a bare `Exception("Command failed with exit code 1\nError output:
Check stderr output for details")`. The exception has no `.stderr` or
`.exit_code` attributes, so #66's `_format_process_error` — which reads
those attributes — has nothing to surface. The log line becomes:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 (exit code: 1)\nError output: Check stderr output for details
That's the placeholder text from the SDK's error path, not the actual
error. Operators chasing a stuck workspace are forced to `docker exec
ws-xxx claude --print` manually to discover the real cause. Observed
today during the rate-limit incident: every PM error line was identical
"Check stderr output for details" while the real cause ("You've hit
your limit · resets Apr 17, 11pm (UTC)") was only visible via manual
reproduction — that cost ~20 minutes of diagnosis time.
## Fix
Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs
`claude --print` with a small probe input, captures stderr+stdout, and
returns the real error string. Bounded by 30s timeout so a hung CLI
can't stall the error path.
Extend `_format_process_error` with ONE narrow fallback: if the
exception has no stderr/exit_code AND its message contains the specific
"Check stderr output for details" marker, call the probe and append
`probed_cli_error=<real error>` to the formatted line.
Critically: the probe only runs in the narrow case where we have
nothing else to log. If `.stderr` or `.exit_code` are present (the
normal ProcessError path from #66), the probe is skipped — no wasted
subprocess, no 30s latency on every error.
## Test coverage
`workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests:
- `test_format_process_error_probes_cli_when_stderr_swallowed` — the
happy path: exception matches the marker, probe runs, result appears
in the formatted line. Probe is monkeypatched so no subprocess spawns
in the test.
- `test_format_process_error_does_not_probe_when_stderr_already_present` —
negative: regular ProcessError with `.stderr` set does NOT trigger
the probe (skip the wasted call).
- `test_format_process_error_does_not_probe_without_swallowed_marker` —
negative: unrelated plain exceptions (e.g. RuntimeError) do NOT
trigger the probe (so the common-case error path stays fast).
All 7 `_format_process_error` tests pass locally (4 existing + 3 new):
\`\`\`
pytest tests/test_claude_sdk_executor.py -k format_process_error
======================= 7 passed in 0.06s ========================
\`\`\`
## Impact
Next time the SDK swallows a real error (rate limit, auth failure,
network outage), the workspace log will contain the actual error string
alongside the generic placeholder:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 ... | probed_cli_error="You've hit your limit · resets Apr
17, 11pm (UTC)"
Diagnosis time drops from "docker exec each ws, run claude --print,
read stderr" (~20 min) to "grep probed_cli_error in platform logs"
(~10 seconds).
Closes#160.
Addresses self-review of the 10-PR batch merged earlier this session.
Splits the follow-ups into this Go-side PR and a later Python/docs PR.
## Fixes
1. wsauth_middleware.go CanvasOrBearer — invalid bearer now hard-rejects
with 401 instead of falling through to the Origin check. Previous code
let an attacker with an expired token + matching Origin bypass auth.
Empty bearer still falls through to the Origin path (the intended
canvas path).
2. scheduler.go short() helper — extracts safe UUID prefix truncation.
Pre-existing unsafe [:12] and [:8] slices would panic on workspace IDs
shorter than the bound. #115's new skip path had the bounds check;
the happy-path log lines did not. One helper, three call sites.
3. activity.go security-event log on source_id spoof — #209 added the
403 but the attempt was invisible to any auditor cron. Stable
greppable log line with authed_workspace, body_source_id, client IP.
## New tests
- TestShort_helper — bounds-safety regression guard for the helper
- TestRecordSkipped_writesSkippedStatus — #115 coverage gap, exercises
UPDATE + INSERT via sqlmock
- TestRecordSkipped_shortWorkspaceIDNoPanic — short-ID crash regression
- TestActivityHandler_Report_SourceIDSpoofRejected — #209 403 path
- TestActivityHandler_Report_MatchingSourceIDAccepted — non-spoof path
- TestHistory_IncludesErrorDetail — #152 problem B coverage
go test -race ./... green locally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 13K-line plugins_install_pipeline.go had zero unit tests, making it
the highest-regression-risk file in the platform handlers package.
New test file covers all testable pure-function and integration paths
that do not require a live Docker daemon:
validatePluginName (8 cases)
- valid names, empty, forward slash, backslash, "..", embedded "..";
path-traversal variants ("../etc", "../../secrets")
dirSize (6 cases)
- empty dir, single file, multiple files, nested subdirectory,
exceeds limit (verifies error mentions "cap"), exactly at limit
httpErr / newHTTPErr (3 cases)
- Error() contains status code, all relevant HTTP codes preserved,
errors.As unwraps through fmt.Errorf %w chains
regexpEscapeForAwk (6 cases)
- alphanumeric names unchanged, slash escaped, dot escaped, + escaped,
full "# Plugin: name /" marker (space not escaped), backslash escaped
streamDirAsTar (4 cases)
- empty dir yields zero entries, single file round-trips content,
nested directory preserves relative path, entries have no absolute
or tempdir-leaking paths
resolveAndStage via stubResolver (10 cases)
- empty source → 400, unknown scheme → 400, happy path (result fields),
staged dir cleaned on fetch error, ErrPluginNotFound → 404,
DeadlineExceeded → 504, generic error → 502, resolver returns invalid
name → 400, local:// path traversal → 400 (pre-Fetch validation)
stubResolver implements plugins.SourceResolver as an in-process test
double — no network, no filesystem side-effects beyond the staging tempdir
that resolveAndStage creates and cleans up.
Closes#217
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The register call was missing headers=auth_headers(), so workspaces that
already have a persisted token (i.e. every restart after the first boot)
were sending an unauthenticated request. The platform's register handler
returns 401 for requests missing a valid bearer token once a token has
been issued, causing re-registration to fail on every restart.
Import auth_headers at the module level (alongside the existing save_token
inline import) and pass it to the httpx POST. auth_headers() returns {}
when no token is on file yet (first boot), so there is no regression for
fresh workspaces — the platform still issues a token on the 200 response
and save_token() persists it for all subsequent restarts.
Closes#215
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A crafted workspace name containing a newline (e.g. "x\nmodel: evil")
could inject arbitrary YAML keys into the auto-generated config.yaml.
Strip \n and \r from the name before interpolation. YAML key injection
requires a newline to start a new mapping entry; other characters such
as `:` are safe in unquoted scalar values.
Adds TestGenerateDefaultConfig_YAMLInjection with three adversarial
inputs: bare \n injection, CRLF injection, and multi-key injection.
Closes#221
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Every agent in the reno-stars org (marketing, sales, dev, coordinator)
plausibly needs browser access at some point — social posts, GBP edits,
directory submissions, InvoiceSimple publish. Without the plugin on
first import, agents fall back to launching their own Chromium inside
the container, which doesn't have the operator's authenticated Chrome
profile (no logged-in sessions, no saved cookies).
Per-agent opt-out via `!browser-automation` is already supported
(PR #71 UNION merge semantics) if any specific role shouldn't have it.
Closes#213
PR #205 shipped the workspace idle-loop mechanism (reflection-on-completion
pattern from the Hermes/Letta research survey) but deliberately added NO
default idle_prompt in org.yaml so rollout could be measured one workspace
at a time before going team-wide.
This is that first opt-in: Technical Researcher gets a backlog-pull + reflect
idle prompt on a 10-minute cadence.
## Why TR first
- Research-heavy role with a naturally bursty load — lots of idle time
between the once-per-hour plugin curation cron fires
- Non-user-facing (no canvas UI impact, no UX risk)
- Already has a clear backlog shape: the plugin curation cron produces
findings that could feed follow-up studies
- Vision-free (no Playwright) so cost per idle tick is pure text
## What the idle_prompt does
Three-step reflection, under 60s wall-clock, max 1 A2A send per tick:
1. **Backlog pull** — search_memory "research-backlog:technical-researcher"
for any stashed research questions (from prior cron fires or Research
Lead delegations). If found → delegate_task to Research Lead with a
concrete deliverable spec, then commit_memory to remove the item from
the backlog.
2. **Reflection fallback** — if backlog is empty, look at the last memory
entry from the Hourly plugin curation cron. Does it surface a follow-up
study worth doing? If yes → file a GH issue labeled `research` and
commit_memory to put the question on the backlog for next tick.
3. **Idle-clean outcome** — if neither backlog nor reflection produced
anything, write "tr-idle HH:MM — clean" to memory and stop. No busy work.
Hard rules enforce: max 1 A2A per tick, skip step 1 if Research Lead busy,
under 60s wall-clock, never re-run a cron's own prompt from inside the idle
loop.
## Rollout plan
- **This PR**: enables TR only via the `idle_prompt` + `idle_interval_seconds`
fields added to its workspace entry in org.yaml.
- **Next 24h**: measure activity_logs delta on TR vs baseline, count
idle-fired delegations vs idle-clean outcomes, confirm Research Lead
isn't being flooded.
- **If green** (delegations land useful work, no flood): roll to Market
Analyst + Competitive Intelligence in a follow-up PR.
- **If noisy** (too many idle fires producing nothing): tune idle_interval
up to 1200-1800s.
## Apply locally per feedback rule
Per `feedback_apply_template_locally_too.md`: not waiting for merge. After
pushing this PR I'll edit TR's live /configs/config.yaml to add the same
idle_prompt + idle_interval_seconds fields, then restart ws-57e13b54-119
(Technical Researcher) so the new workspace-template binary picks up the
idle loop immediately. Measurement clock starts from that restart.
## Related
- #205 (mechanism) — just merged in this cycle (7f11328)
- #208 Hermes Phase 1 — also just merged (be53a33)
- docs/ecosystem-watch.md → `### Hermes Agent` — reflection-on-completion
pattern reference
Closes#211 HIGH ops/security. RunMigrations globbed \`*.sql\` which
matches both \`.up.sql\` AND \`.down.sql\`. Alphabetical sort puts \"d\"
before \"u\", so every platform boot ran the rollback BEFORE the forward
migration for any pair starting with migration 018.
Net effect: every restart wiped workspace_auth_tokens (the 020 pair),
which in turn regressed AdminAuth to its fail-open bootstrap bypass for
every route protected by it — the live server was effectively
unauthenticated from restart until the next workspace re-registered.
Also wiped 018_secrets_encryption_version and 019_workspace_access
pairs silently.
Fix is a 3-line filter: skip files whose base name ends in \`.down.sql\`.
Down migrations remain on disk for operator-driven rollback via psql,
but are never picked up by the auto-run loop.
Added unit test against a tmp dir to lock the filter behaviour so this
can never regress: stages a mix of legacy plain .sql, matched up/down
pairs, asserts only forward files survive.
Follow-up (not in this PR): the runner still re-applies every migration
on every boot. Migrations must be idempotent. A proper schema_migrations
tracking table is tracked as a future cleanup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#204. PR #198 wired push_sender=PushNotificationSender() into
DefaultRequestHandler to satisfy #175's push-notification capability,
but PushNotificationSender in a2a-sdk is an abstract base class and
cannot be instantiated. Every workspace container crashed on startup
with TypeError.
Reverted to DefaultRequestHandler's defaults. The pushNotifications
capability still appears in AgentCard.capabilities (advertised to A2A
clients) but actual implementation of the sender is deferred to a
Phase-H follow-up that subclasses PushNotificationSender properly.
Existing pytest suite unchanged (the crash was only at runtime on
main.py import, which no existing test exercises directly).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cherry-picks the one genuinely new fix from #169 after confirming the
rest of that PR is already covered on main (C1/C3/C5 by wsAuth group,
C6 by #94+#119 SSRF blocklist, C4 ownership by existing WHERE filter).
Pre-existing middleware (WorkspaceAuth on /workspaces/:id/* sub-routes)
proves the caller owns the :id path param. But the body field
source_id was never validated — a workspace authenticated for its own
/activity endpoint could still attribute logs to a different workspace
by setting source_id=<foreign UUID>. Rejected with 403 now.
No schema change, no new middleware. 4-line handler delta. Closes the
only real gap in #169; #169 itself will be closed as superseded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ships the first half of the queued Hermes adapter expansion. PR 2 only
supported Nous Portal + OpenRouter; this adds 13 more providers reachable
via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are
Phase 2 (better tool-calling + vision fidelity).
## What's new
**`workspace-template/adapters/hermes/providers.py`** (new file, 220 LOC):
- ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs
- ``PROVIDERS`` dict with 15 entries across 4 groups:
- PR 2 baseline: nous_portal, openrouter
- Frontier commercial: openai, anthropic, xai, gemini
- Chinese providers: qwen, glm, kimi, minimax, deepseek
- OSS/alt: groq, together, fireworks, mistral
- ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first,
then commercial, then Chinese, then OSS/alt)
- ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key)
- With explicit name: routes to that provider, raises if env var empty
- Without: walks RESOLUTION_ORDER, first env-var-set provider wins
**`workspace-template/adapters/hermes/executor.py`** (refactored):
- `create_executor(hermes_api_key=None, provider=None, model=None)` now has
three parameters:
- `hermes_api_key`: PR 2 back-compat — routes to Nous Portal
- `provider`: canonical short name from the registry (e.g. "anthropic")
- `model`: optional override of the provider's default model
- Delegates all resolution to `providers.resolve_provider()` — no more
hardcoded URLs or env var lookups in the executor itself
- `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers
pass base_url + model explicitly (which create_executor always does)
**`workspace-template/tests/test_hermes_providers.py`** (new file, 26 tests):
- Registry shape invariants (count >= 15, no duplicates, every config valid)
- PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly
- Auto-detect for every provider in the registry (parametrized — guards against
typos in env var lists)
- Explicit `provider=` bypass of auto-detect
- Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env
- All 26 tests pass locally in 0.08s
## Back-compat guarantees
| Scenario | PR 2 behavior | This PR behavior |
|---|---|---|
| `create_executor(hermes_api_key="x")` | Nous Portal | Nous Portal (unchanged) |
| `HERMES_API_KEY=x` env, auto-detect | Nous Portal | Nous Portal (unchanged) |
| `OPENROUTER_API_KEY=x` env, auto-detect | OpenRouter | OpenRouter (unchanged) |
| Both env + explicit hermes_api_key param | Nous Portal (param wins) | Nous Portal (param wins, unchanged) |
Nothing existing can break. New callers gain access to 13 more providers.
## What's NOT in this PR (Phase 2)
- **Native Anthropic Messages API path** — better tool calling, vision, extended
thinking. Requires pulling in `anthropic` SDK. ~50 LOC.
- **Native Gemini generateContent path** — for vision + google tools. Requires
`google-genai` SDK. ~50 LOC.
- **Streaming support across all providers** — current executor is non-streaming
(single chat.completions.create call). Streaming works with openai.AsyncOpenAI
but hasn't been wired to the A2A event queue path. ~30 LOC.
- **Per-provider model overrides in config.yaml** — Phase 1 uses the registry's
default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }`
block in the workspace config.
- **`.env.example` updates** — not critical since the registry itself documents
every env var via the `env_vars` field, but nice-to-have.
## Related
- Queued memory: `project_hermes_multi_provider.md`
- CEO directive 2026-04-15: *"once current works are cleared, I want you to
focus on supporting hermes agent, right now it doesnt take too much providers"*
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch
entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which
shaped this registry's initial set
## Test plan
- [x] Unit tests: 26/26 pass locally (pytest)
- [ ] CI will run on the self-hosted macOS arm64 runner
- [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical
Researcher actually hits Alibaba DashScope successfully
- [ ] Integration test per provider with real API keys (gated on env, skip
when not set — Phase 2 CI addition)
Closes#115. The Security Auditor hourly cron (and likely others) hit a
~36% miss rate because the platform's A2A proxy rejected fires with
"workspace agent busy — retry after a short backoff" while the agent was
still executing the prior audit. That error was recorded as a hard
failure and polluted last_error.
New behaviour:
Before fireSchedule calls into the A2A proxy, it reads
workspaces.active_tasks for the target. If >0, it:
- Advances next_run_at to the next cron slot (cron keeps ticking)
- Bumps run_count
- Sets last_status='skipped' + last_error=<reason>
- Inserts a cron_run activity_logs row with status='skipped' + error_detail
- Broadcasts CRON_SKIPPED for canvas + operators
Effect: busy-collision ceases to be an error. The history surface now
distinguishes "ran and failed" from "skipped because busy". Operators
can tell the difference at a glance, and the liveness view doesn't
stall waiting for the next ticker cycle.
Pairs with #149 (dedicated heartbeat pulse) and #152 problem B
(error_detail surfaced in history) for a coherent scheduler story.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#152 problem B (schedule history API drops error detail).
Two tiny changes:
1. scheduler.fireSchedule now writes lastError into activity_logs.error_detail
when inserting the cron_run row. Previously the column was left NULL even
on failure because the INSERT didn't include it.
2. schedules.History SELECT now reads error_detail and includes it in the
JSON response under error_detail. Frontend + audit cron can now display
"why did this run fail" instead of just "status=error".
No schema change — activity_logs.error_detail already exists from
migration 009. This just starts using the column.
Problem A of #152 (Research Lead ecosystem-watch 50% error rate on its
own) is a separate ops investigation and stays open.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Today's multi-framework research (Hermes, Letta, Trigger.dev, Inngest, AG2,
Rivet, n8n, Composio, SWE-agent — see docs/ecosystem-watch.md) confirmed
that nobody runs while(true) per agent. The working patterns are:
(a) event-driven + hibernation (Hermes, Letta, Trigger.dev, Inngest)
(b) cron/user-triggered ephemeral runs (AG2, Rivet, n8n, SWE-agent)
Molecule AI is currently 100% in category (b). Observed team utilization:
~0.5% — agents idle 99.5% of the time because cron fires and CEO-typed
A2A are the only initiating signals. CEO's north-star is 24/7 iteration,
current cadence falls short.
This PR closes the gap by adding an in-workspace idle loop that wakes the
agent periodically ONLY when it has no active task. The shape is the
Hermes reflection-on-completion pattern combined with the Letta backlog-pull
pattern, collapsed into a ~60 LOC change in the workspace-template. Zero
new Go code. Zero new DB tables. Zero new API endpoints.
## How it works
1. `config.py` gets two new fields on WorkspaceConfig:
- `idle_prompt: str = ""` — the prompt to self-send when idle
- `idle_interval_seconds: int = 600` — how often to check (default 10 min)
Both support inline or file ref (matching the initial_prompt pattern).
2. `main.py` spawns an `_run_idle_loop()` asyncio task alongside the
existing initial_prompt task (same lifecycle hooks — cancelled in the
`finally:` of the server.serve() block).
3. The loop body:
a. Sleep interval
b. Check `heartbeat.active_tasks == 0` LOCALLY (no LLM call, no HTTP)
c. If idle → self-POST the idle_prompt via the existing /workspaces/{id}/a2a proxy
d. Loop
The agent's own concurrency control rejects the post if it becomes busy
between the check and the POST — that's the safety valve.
4. Gated on `config.idle_prompt` being non-empty. Default = "" = no loop.
Existing workspaces upgrade silently as no-ops until someone explicitly
opts in by setting idle_prompt in org.yaml (either defaults: or
per-workspace:).
## Cost analysis (from the research report)
- while(true) pattern: ~$93/day/org (12 agents × 12 thinks/hour × $0.027). Unshippable.
- Hermes reflection-on-completion: ~$0.45/day/org. Cost ∝ useful work.
- This PR's idle loop at 10-min cadence: upper bound 12 × 6/hour × 24h
× ~3k tokens × Sonnet rate ≈ $5/day/org PER ROLE, only if they're
genuinely idle every check. In practice far less because busy periods
skip the LLM call entirely (the active_tasks check is local).
## Rollout plan
Research report recommended rolling to ONE workspace first (Technical
Researcher) and measuring 24h of activity_logs before enabling for
all 12. This PR enables the mechanism; it does NOT add any default
idle_prompt to org-templates/molecule-dev/org.yaml. That's a follow-up
PR after this one lands and one workspace has been manually opted in
for measurement.
## Not touched in this PR
- No Go code (no new platform endpoint, no new DB columns)
- No org.yaml changes (zero-impact until someone opts in)
- No scheduler changes (the idle loop is a workspace concern, not a
scheduler concern — matches the research report's layering)
## Test plan
- [x] Python syntax check (ast.parse) on main.py + config.py
- [ ] Unit test: WorkspaceConfig parses idle_prompt / idle_interval_seconds from yaml
- [ ] Integration test: set idle_prompt on Technical Researcher, measure that
an A2A message is received every ~10 min while idle, and NOT received
while busy with a delegation
- [ ] Dogfood: enable on Technical Researcher for 24h, count activity_logs
delta vs baseline, confirm cost stays within model
## Related
- Today's research report (conversation output, summarized in commit trailer)
- docs/ecosystem-watch.md → `### Hermes Agent` (the canonical reflection-on-completion example)
- #159 orchestrator/worker split — complementary: leaders pulse for dispatch,
workers idle-loop for pull. Together: leaders push work, workers pull work,
no role ever sits idle with a cold queue.
Closes#168 by the route-split path from #194's review. #167 put PUT
/canvas/viewport behind strict AdminAuth, breaking canvas drag/zoom
persist because the canvas uses session cookies not bearer tokens.
New narrow middleware CanvasOrBearer:
- Accepts a valid bearer (same contract as AdminAuth) OR
- Accepts a request whose Origin exactly matches CORS_ORIGINS
- Lazy-bootstrap fail-open preserved for fresh installs
Applied ONLY to PUT /canvas/viewport. The softer check is acceptable
there because viewport corruption is cosmetic-only — worst case a
user refreshes the page. This middleware must NOT be used on routes
that leak prompts (#165), create resources (#164), or write files
(#190) — see #194 review for why.
The other canvas-facing routes mentioned in #168 (Events tab, Bundle
Export/Import) remain behind strict AdminAuth pending a proper
session-cookie-accepting AdminAuth (#168 follow-up for Phase H).
6 new tests cover: bootstrap fail-open, no-creds 401, canvas origin
match, wrong origin 401, empty origin rejected, localhost default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>