molecule-core

Author	SHA1	Message	Date
Hongming Wang	00626a41a5	Merge pull request #224 from Molecule-AI/fix/issue-221-yaml-injection fix(security): sanitize workspace name before YAML interpolation	2026-04-15 11:59:10 -07:00
Hongming Wang	dacd78b8f9	Merge pull request #231 from Molecule-AI/fix/160-sdk-error-probe fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr	2026-04-15 11:58:59 -07:00
Hongming Wang	2616f2e4a1	Merge pull request #227 from Molecule-AI/test/issue-217-plugin-pipeline-tests test(handlers): unit test suite for plugins_install_pipeline.go	2026-04-15 11:58:56 -07:00
Hongming Wang	38fcb8a374	Merge pull request #225 from Molecule-AI/fix/issue-215-register-auth fix(workspace-template): add auth_headers() to /registry/register POST	2026-04-15 11:58:53 -07:00
Hongming Wang	6b9972f699	Merge pull request #216 from Molecule-AI/feat/tr-idle-prompt chore(template): enable idle-loop pilot on Technical Researcher (#205 follow-up)	2026-04-15 11:58:50 -07:00
Hongming Wang	4aef231d71	Merge pull request #223 from Molecule-AI/fix/reno-stars-browser-automation-default fix(reno-stars): default plugins to browser-automation	2026-04-15 11:58:46 -07:00
Hongming Wang	cb0205ed95	fix(security): #221 — quote name as YAML scalar instead of stripping newlines The original fix stripped \n/\r but left the rest in place, then relied on a substring-based test which was over-strict (the escaped fragment still contained the banned substring as bytes). Better approach: emit the name as a double-quoted YAML scalar with all escape sequences (\\, \", \n, \r, \t) handled inline. This is the canonical YAML-safe way to embed user input — no injection possible because every control character is either escaped or rejected by the YAML parser inside the scalar context. Test rewritten to parse the output as YAML and verify: 1. parsed[\"name\"] equals the literal attacker input (payload preserved) 2. no banned top-level keys leaked to the parsed map 3. legitimate default keys (description/version/tier/model) still present Updated the two existing tests that asserted the unquoted name format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:58:16 -07:00
Hongming Wang	626fb3e803	Merge branch 'main' into fix/160-sdk-error-probe	2026-04-15 11:54:13 -07:00
Hongming Wang	1c0e3565af	Merge branch 'main' into test/issue-217-plugin-pipeline-tests	2026-04-15 11:54:12 -07:00
Hongming Wang	c730f6bc02	Merge branch 'main' into fix/issue-221-yaml-injection	2026-04-15 11:54:10 -07:00
Hongming Wang	d6fbd2aa04	Merge branch 'main' into fix/issue-215-register-auth	2026-04-15 11:54:09 -07:00
Hongming Wang	14ee966f2b	Merge branch 'main' into feat/tr-idle-prompt	2026-04-15 11:54:08 -07:00
Hongming Wang	dfb2f9626a	Merge branch 'main' into fix/reno-stars-browser-automation-default	2026-04-15 11:54:06 -07:00
Hongming Wang	2032b478ca	Merge pull request #232 from Molecule-AI/fix/code-review-idle-loop-and-docs fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook	2026-04-15 11:52:06 -07:00
Hongming Wang	aab93de291	fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook Addresses items 4, 5, 7 from the self-review of the batch merge. PR A (#228) covered items 1, 2, 3, 6 on the Go side. ## workspace-template/main.py — idle loop hardening - Replace asyncio.get_event_loop() with asyncio.get_running_loop() — the former is deprecated in 3.12+ and emits a DeprecationWarning on every idle fire. - Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS clamped to max(60, min(300, idle_interval_seconds)). Long cadence workspaces no longer hold dangling requests open for 10 minutes; the cap adapts automatically when the interval is short. - Type the exception handling: split HTTPError (has .code) from URLError (connection-level) from the generic catch-all. Log status + error class separately so operators can grep for specific failure modes instead of a bare "post failed". - Fire-and-forget no longer loses exceptions. run_in_executor Future now has an add_done_callback that logs the outcome, so a panic in _post_sync surfaces as "Idle loop: post failed — status=None err=..." instead of Python's default "Task exception was never retrieved" warning burried in stderr. ## org-templates/molecule-dev/org.yaml — discoverability Added idle_prompt + idle_interval_seconds to the defaults: block with explanatory comments. Without this, users had to read main.py to discover the feature. ## docs/runbooks/admin-auth.md — new Documents the three middleware variants (AdminAuth strict, CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each, and the three-question test for adding a new route to CanvasOrBearer. Also flags the session-cookie follow-up as Phase H. Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203, #228. No code deltas in platform/ beyond the Python + YAML + docs changes. Full pytest suite unchanged except the pre-existing test_hermes_smoke flake that fails in full-suite but passes in isolation (test isolation bug, not introduced by this PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:52:01 -07:00
rabbitblood	0f2ed6bf0a	fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr Context: when the claude-agent-sdk wraps a stream error from the CLI subprocess that it can't categorize (rate limit, auth, network), it raises a bare `Exception("Command failed with exit code 1\nError output: Check stderr output for details")`. The exception has no `.stderr` or `.exit_code` attributes, so #66's `_format_process_error` — which reads those attributes — has nothing to surface. The log line becomes: SDK agent error [claude-code]: Exception: Command failed with exit code 1 (exit code: 1)\nError output: Check stderr output for details That's the placeholder text from the SDK's error path, not the actual error. Operators chasing a stuck workspace are forced to `docker exec ws-xxx claude --print` manually to discover the real cause. Observed today during the rate-limit incident: every PM error line was identical "Check stderr output for details" while the real cause ("You've hit your limit · resets Apr 17, 11pm (UTC)") was only visible via manual reproduction — that cost ~20 minutes of diagnosis time. ## Fix Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs `claude --print` with a small probe input, captures stderr+stdout, and returns the real error string. Bounded by 30s timeout so a hung CLI can't stall the error path. Extend `_format_process_error` with ONE narrow fallback: if the exception has no stderr/exit_code AND its message contains the specific "Check stderr output for details" marker, call the probe and append `probed_cli_error=<real error>` to the formatted line. Critically: the probe only runs in the narrow case where we have nothing else to log. If `.stderr` or `.exit_code` are present (the normal ProcessError path from #66), the probe is skipped — no wasted subprocess, no 30s latency on every error. ## Test coverage `workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests: - `test_format_process_error_probes_cli_when_stderr_swallowed` — the happy path: exception matches the marker, probe runs, result appears in the formatted line. Probe is monkeypatched so no subprocess spawns in the test. - `test_format_process_error_does_not_probe_when_stderr_already_present` — negative: regular ProcessError with `.stderr` set does NOT trigger the probe (skip the wasted call). - `test_format_process_error_does_not_probe_without_swallowed_marker` — negative: unrelated plain exceptions (e.g. RuntimeError) do NOT trigger the probe (so the common-case error path stays fast). All 7 `_format_process_error` tests pass locally (4 existing + 3 new): \`\`\` pytest tests/test_claude_sdk_executor.py -k format_process_error ======================= 7 passed in 0.06s ======================== \`\`\` ## Impact Next time the SDK swallows a real error (rate limit, auth failure, network outage), the workspace log will contain the actual error string alongside the generic placeholder: SDK agent error [claude-code]: Exception: Command failed with exit code 1 ... \| probed_cli_error="You've hit your limit · resets Apr 17, 11pm (UTC)" Diagnosis time drops from "docker exec each ws, run claude --print, read stderr" (~20 min) to "grep probed_cli_error in platform logs" (~10 seconds). Closes #160.	2026-04-15 11:50:55 -07:00
Hongming Wang	8aad65287a	Merge pull request #228 from Molecule-AI/fix/code-review-go-batch fix(code-review): Go-side follow-ups from self-review batch	2026-04-15 11:48:30 -07:00
Hongming Wang	410d2493d1	fix(code-review): CanvasOrBearer fall-through, scheduler short(), activity spoof log + 6 new tests Addresses self-review of the 10-PR batch merged earlier this session. Splits the follow-ups into this Go-side PR and a later Python/docs PR. ## Fixes 1. wsauth_middleware.go CanvasOrBearer — invalid bearer now hard-rejects with 401 instead of falling through to the Origin check. Previous code let an attacker with an expired token + matching Origin bypass auth. Empty bearer still falls through to the Origin path (the intended canvas path). 2. scheduler.go short() helper — extracts safe UUID prefix truncation. Pre-existing unsafe [:12] and [:8] slices would panic on workspace IDs shorter than the bound. #115's new skip path had the bounds check; the happy-path log lines did not. One helper, three call sites. 3. activity.go security-event log on source_id spoof — #209 added the 403 but the attempt was invisible to any auditor cron. Stable greppable log line with authed_workspace, body_source_id, client IP. ## New tests - TestShort_helper — bounds-safety regression guard for the helper - TestRecordSkipped_writesSkippedStatus — #115 coverage gap, exercises UPDATE + INSERT via sqlmock - TestRecordSkipped_shortWorkspaceIDNoPanic — short-ID crash regression - TestActivityHandler_Report_SourceIDSpoofRejected — #209 403 path - TestActivityHandler_Report_MatchingSourceIDAccepted — non-spoof path - TestHistory_IncludesErrorDetail — #152 problem B coverage go test -race ./... green locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:48:25 -07:00
Dev Lead Agent	a3ce767822	test(handlers): add unit test suite for plugins_install_pipeline.go The 13K-line plugins_install_pipeline.go had zero unit tests, making it the highest-regression-risk file in the platform handlers package. New test file covers all testable pure-function and integration paths that do not require a live Docker daemon: validatePluginName (8 cases) - valid names, empty, forward slash, backslash, "..", embedded ".."; path-traversal variants ("../etc", "../../secrets") dirSize (6 cases) - empty dir, single file, multiple files, nested subdirectory, exceeds limit (verifies error mentions "cap"), exactly at limit httpErr / newHTTPErr (3 cases) - Error() contains status code, all relevant HTTP codes preserved, errors.As unwraps through fmt.Errorf %w chains regexpEscapeForAwk (6 cases) - alphanumeric names unchanged, slash escaped, dot escaped, + escaped, full "# Plugin: name /" marker (space not escaped), backslash escaped streamDirAsTar (4 cases) - empty dir yields zero entries, single file round-trips content, nested directory preserves relative path, entries have no absolute or tempdir-leaking paths resolveAndStage via stubResolver (10 cases) - empty source → 400, unknown scheme → 400, happy path (result fields), staged dir cleaned on fetch error, ErrPluginNotFound → 404, DeadlineExceeded → 504, generic error → 502, resolver returns invalid name → 400, local:// path traversal → 400 (pre-Fetch validation) stubResolver implements plugins.SourceResolver as an in-process test double — no network, no filesystem side-effects beyond the staging tempdir that resolveAndStage creates and cleans up. Closes #217 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 18:47:25 +00:00
Dev Lead Agent	20657e4e57	fix(workspace-template): include auth_headers() on /registry/register POST The register call was missing headers=auth_headers(), so workspaces that already have a persisted token (i.e. every restart after the first boot) were sending an unauthenticated request. The platform's register handler returns 401 for requests missing a valid bearer token once a token has been issued, causing re-registration to fail on every restart. Import auth_headers at the module level (alongside the existing save_token inline import) and pass it to the httpx POST. auth_headers() returns {} when no token is on file yet (first boot), so there is no regression for fresh workspaces — the platform still issues a token on the 200 response and save_token() persists it for all subsequent restarts. Closes #215 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 18:44:53 +00:00
Dev Lead Agent	afea61ae52	fix(security): sanitize body.Name before YAML interpolation in generateDefaultConfig A crafted workspace name containing a newline (e.g. "x\nmodel: evil") could inject arbitrary YAML keys into the auto-generated config.yaml. Strip \n and \r from the name before interpolation. YAML key injection requires a newline to start a new mapping entry; other characters such as `:` are safe in unquoted scalar values. Adds TestGenerateDefaultConfig_YAMLInjection with three adversarial inputs: bare \n injection, CRLF injection, and multi-key injection. Closes #221 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 18:44:11 +00:00
airenostars	a781d21f46	fix(reno-stars): default plugins to browser-automation Every agent in the reno-stars org (marketing, sales, dev, coordinator) plausibly needs browser access at some point — social posts, GBP edits, directory submissions, InvoiceSimple publish. Without the plugin on first import, agents fall back to launching their own Chromium inside the container, which doesn't have the operator's authenticated Chrome profile (no logged-in sessions, no saved cookies). Per-agent opt-out via `!browser-automation` is already supported (PR #71 UNION merge semantics) if any specific role shouldn't have it. Closes #213	2026-04-15 11:43:48 -07:00
rabbitblood	2539f57f08	chore(template): enable idle-loop pilot on Technical Researcher (#205 follow-up) PR #205 shipped the workspace idle-loop mechanism (reflection-on-completion pattern from the Hermes/Letta research survey) but deliberately added NO default idle_prompt in org.yaml so rollout could be measured one workspace at a time before going team-wide. This is that first opt-in: Technical Researcher gets a backlog-pull + reflect idle prompt on a 10-minute cadence. ## Why TR first - Research-heavy role with a naturally bursty load — lots of idle time between the once-per-hour plugin curation cron fires - Non-user-facing (no canvas UI impact, no UX risk) - Already has a clear backlog shape: the plugin curation cron produces findings that could feed follow-up studies - Vision-free (no Playwright) so cost per idle tick is pure text ## What the idle_prompt does Three-step reflection, under 60s wall-clock, max 1 A2A send per tick: 1. Backlog pull — search_memory "research-backlog:technical-researcher" for any stashed research questions (from prior cron fires or Research Lead delegations). If found → delegate_task to Research Lead with a concrete deliverable spec, then commit_memory to remove the item from the backlog. 2. Reflection fallback — if backlog is empty, look at the last memory entry from the Hourly plugin curation cron. Does it surface a follow-up study worth doing? If yes → file a GH issue labeled `research` and commit_memory to put the question on the backlog for next tick. 3. Idle-clean outcome — if neither backlog nor reflection produced anything, write "tr-idle HH:MM — clean" to memory and stop. No busy work. Hard rules enforce: max 1 A2A per tick, skip step 1 if Research Lead busy, under 60s wall-clock, never re-run a cron's own prompt from inside the idle loop. ## Rollout plan - This PR: enables TR only via the `idle_prompt` + `idle_interval_seconds` fields added to its workspace entry in org.yaml. - Next 24h: measure activity_logs delta on TR vs baseline, count idle-fired delegations vs idle-clean outcomes, confirm Research Lead isn't being flooded. - If green (delegations land useful work, no flood): roll to Market Analyst + Competitive Intelligence in a follow-up PR. - If noisy (too many idle fires producing nothing): tune idle_interval up to 1200-1800s. ## Apply locally per feedback rule Per `feedback_apply_template_locally_too.md`: not waiting for merge. After pushing this PR I'll edit TR's live /configs/config.yaml to add the same idle_prompt + idle_interval_seconds fields, then restart ws-57e13b54-119 (Technical Researcher) so the new workspace-template binary picks up the idle loop immediately. Measurement clock starts from that restart. ## Related - #205 (mechanism) — just merged in this cycle (`54eb8d7`) - #208 Hermes Phase 1 — also just merged (`381a3c8`) - docs/ecosystem-watch.md → `### Hermes Agent` — reflection-on-completion pattern reference	2026-04-15 11:34:51 -07:00
Hongming Wang	56801ce05b	Merge pull request #212 from Molecule-AI/fix/issue-211-migration-runner-skips-down fix(db): #211 — migration runner skips *.down.sql (stop wiping data on boot)	2026-04-15 11:24:11 -07:00
Hongming Wang	a507961f22	fix(db): #211 — migration runner skips .down.sql (stop wiping data on boot) Closes #211 HIGH ops/security. RunMigrations globbed \`.sql\` which matches both \`.up.sql\` AND \`.down.sql\`. Alphabetical sort puts \"d\" before \"u\", so every platform boot ran the rollback BEFORE the forward migration for any pair starting with migration 018. Net effect: every restart wiped workspace_auth_tokens (the 020 pair), which in turn regressed AdminAuth to its fail-open bootstrap bypass for every route protected by it — the live server was effectively unauthenticated from restart until the next workspace re-registered. Also wiped 018_secrets_encryption_version and 019_workspace_access pairs silently. Fix is a 3-line filter: skip files whose base name ends in \`.down.sql\`. Down migrations remain on disk for operator-driven rollback via psql, but are never picked up by the auto-run loop. Added unit test against a tmp dir to lock the filter behaviour so this can never regress: stages a mix of legacy plain .sql, matched up/down pairs, asserts only forward files survive. Follow-up (not in this PR): the runner still re-applies every migration on every boot. Migrations must be idempotent. A proper schema_migrations tracking table is tracked as a future cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:24:06 -07:00
Hongming Wang	54eb8d7dab	Merge pull request #205 from Molecule-AI/feat/workspace-idle-loop feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape, opt-in, ~90 LOC)	2026-04-15 11:21:47 -07:00
Hongming Wang	db36b5a97f	Merge remote-tracking branch 'origin/main' into feat/workspace-idle-loop	2026-04-15 11:21:15 -07:00
Hongming Wang	381a3c8774	Merge pull request #208 from Molecule-AI/feat/hermes-phase1-provider-registry feat(hermes): Phase 1 — multi-provider registry (15 providers, 26 tests, back-compat preserved)	2026-04-15 11:21:05 -07:00
Hongming Wang	8430c1ad98	Merge remote-tracking branch 'origin/main' into feat/hermes-phase1-provider-registry	2026-04-15 11:20:51 -07:00
Hongming Wang	012a3c075b	Merge branch 'main' into feat/hermes-phase1-provider-registry	2026-04-15 11:20:06 -07:00
Hongming Wang	e390fa060d	Merge pull request #210 from Molecule-AI/fix/issue-204-push-sender-abstract fix(workspace-template): #204 — drop PushNotificationSender (abstract class)	2026-04-15 11:18:57 -07:00
Hongming Wang	4f8577d2be	fix(workspace-template): #204 — drop PushNotificationSender (abstract class) Closes #204. PR #198 wired push_sender=PushNotificationSender() into DefaultRequestHandler to satisfy #175's push-notification capability, but PushNotificationSender in a2a-sdk is an abstract base class and cannot be instantiated. Every workspace container crashed on startup with TypeError. Reverted to DefaultRequestHandler's defaults. The pushNotifications capability still appears in AgentCard.capabilities (advertised to A2A clients) but actual implementation of the sender is deferred to a Phase-H follow-up that subclasses PushNotificationSender properly. Existing pytest suite unchanged (the crash was only at runtime on main.py import, which no existing test exercises directly). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:18:52 -07:00
Hongming Wang	da20ae4717	Merge pull request #209 from Molecule-AI/fix/c2-source-id-spoof-check fix(security): C2 from #169 — reject spoofed source_id in activity.Report	2026-04-15 11:15:14 -07:00
Hongming Wang	a04f7c288d	fix(security): C2 from #169 — reject spoofed source_id in activity.Report Cherry-picks the one genuinely new fix from #169 after confirming the rest of that PR is already covered on main (C1/C3/C5 by wsAuth group, C6 by #94+#119 SSRF blocklist, C4 ownership by existing WHERE filter). Pre-existing middleware (WorkspaceAuth on /workspaces/:id/* sub-routes) proves the caller owns the :id path param. But the body field source_id was never validated — a workspace authenticated for its own /activity endpoint could still attribute logs to a different workspace by setting source_id=<foreign UUID>. Rejected with 403 now. No schema change, no new middleware. 4-line handler delta. Closes the only real gap in #169; #169 itself will be closed as superseded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:15:08 -07:00
rabbitblood	376c9574a3	feat(hermes): Phase 1 — multi-provider registry (15 providers, back-compat preserved) Ships the first half of the queued Hermes adapter expansion. PR 2 only supported Nous Portal + OpenRouter; this adds 13 more providers reachable via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are Phase 2 (better tool-calling + vision fidelity). ## What's new `workspace-template/adapters/hermes/providers.py` (new file, 220 LOC): - ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs - ``PROVIDERS`` dict with 15 entries across 4 groups: - PR 2 baseline: nous_portal, openrouter - Frontier commercial: openai, anthropic, xai, gemini - Chinese providers: qwen, glm, kimi, minimax, deepseek - OSS/alt: groq, together, fireworks, mistral - ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first, then commercial, then Chinese, then OSS/alt) - ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key) - With explicit name: routes to that provider, raises if env var empty - Without: walks RESOLUTION_ORDER, first env-var-set provider wins `workspace-template/adapters/hermes/executor.py` (refactored): - `create_executor(hermes_api_key=None, provider=None, model=None)` now has three parameters: - `hermes_api_key`: PR 2 back-compat — routes to Nous Portal - `provider`: canonical short name from the registry (e.g. "anthropic") - `model`: optional override of the provider's default model - Delegates all resolution to `providers.resolve_provider()` — no more hardcoded URLs or env var lookups in the executor itself - `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers pass base_url + model explicitly (which create_executor always does) `workspace-template/tests/test_hermes_providers.py` (new file, 26 tests): - Registry shape invariants (count >= 15, no duplicates, every config valid) - PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly - Auto-detect for every provider in the registry (parametrized — guards against typos in env var lists) - Explicit `provider=` bypass of auto-detect - Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env - All 26 tests pass locally in 0.08s ## Back-compat guarantees \| Scenario \| PR 2 behavior \| This PR behavior \| \|---\|---\|---\| \| `create_executor(hermes_api_key="x")` \| Nous Portal \| Nous Portal (unchanged) \| \| `HERMES_API_KEY=x` env, auto-detect \| Nous Portal \| Nous Portal (unchanged) \| \| `OPENROUTER_API_KEY=x` env, auto-detect \| OpenRouter \| OpenRouter (unchanged) \| \| Both env + explicit hermes_api_key param \| Nous Portal (param wins) \| Nous Portal (param wins, unchanged) \| Nothing existing can break. New callers gain access to 13 more providers. ## What's NOT in this PR (Phase 2) - Native Anthropic Messages API path — better tool calling, vision, extended thinking. Requires pulling in `anthropic` SDK. ~50 LOC. - Native Gemini generateContent path — for vision + google tools. Requires `google-genai` SDK. ~50 LOC. - Streaming support across all providers — current executor is non-streaming (single chat.completions.create call). Streaming works with openai.AsyncOpenAI but hasn't been wired to the A2A event queue path. ~30 LOC. - Per-provider model overrides in config.yaml — Phase 1 uses the registry's default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }` block in the workspace config. - `.env.example` updates — not critical since the registry itself documents every env var via the `env_vars` field, but nice-to-have. ## Related - Queued memory: `project_hermes_multi_provider.md` - CEO directive 2026-04-15: "once current works are cleared, I want you to focus on supporting hermes agent, right now it doesnt take too much providers" - `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which shaped this registry's initial set ## Test plan - [x] Unit tests: 26/26 pass locally (pytest) - [ ] CI will run on the self-hosted macOS arm64 runner - [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical Researcher actually hits Alibaba DashScope successfully - [ ] Integration test per provider with real API keys (gated on env, skip when not set — Phase 2 CI addition)	2026-04-15 11:14:35 -07:00
Hongming Wang	519d478ea2	Merge pull request #207 from Molecule-AI/fix/issue-115-scheduler-busy-skip fix(scheduler): #115 — skip cron fire when workspace busy	2026-04-15 11:13:20 -07:00
Hongming Wang	2624d28f0c	fix(scheduler): #115 — skip cron fire when workspace is busy Closes #115. The Security Auditor hourly cron (and likely others) hit a ~36% miss rate because the platform's A2A proxy rejected fires with "workspace agent busy — retry after a short backoff" while the agent was still executing the prior audit. That error was recorded as a hard failure and polluted last_error. New behaviour: Before fireSchedule calls into the A2A proxy, it reads workspaces.active_tasks for the target. If >0, it: - Advances next_run_at to the next cron slot (cron keeps ticking) - Bumps run_count - Sets last_status='skipped' + last_error=<reason> - Inserts a cron_run activity_logs row with status='skipped' + error_detail - Broadcasts CRON_SKIPPED for canvas + operators Effect: busy-collision ceases to be an error. The history surface now distinguishes "ran and failed" from "skipped because busy". Operators can tell the difference at a glance, and the liveness view doesn't stall waiting for the next ticker cycle. Pairs with #149 (dedicated heartbeat pulse) and #152 problem B (error_detail surfaced in history) for a coherent scheduler story. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:13:15 -07:00
Hongming Wang	894265d269	Merge pull request #206 from Molecule-AI/fix/issue-152-schedule-history-error-detail fix(scheduler): #152 problem B — surface cron error_detail in schedule history	2026-04-15 11:11:21 -07:00
Hongming Wang	4d7c0ee01d	fix(scheduler): #152 problem B — persist and surface cron error_detail Closes #152 problem B (schedule history API drops error detail). Two tiny changes: 1. scheduler.fireSchedule now writes lastError into activity_logs.error_detail when inserting the cron_run row. Previously the column was left NULL even on failure because the INSERT didn't include it. 2. schedules.History SELECT now reads error_detail and includes it in the JSON response under error_detail. Frontend + audit cron can now display "why did this run fail" instead of just "status=error". No schema change — activity_logs.error_detail already exists from migration 009. This just starts using the column. Problem A of #152 (Research Lead ecosystem-watch 50% error rate on its own) is a separate ops investigation and stays open. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:11:16 -07:00
rabbitblood	4dfb7a42b7	feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape) Today's multi-framework research (Hermes, Letta, Trigger.dev, Inngest, AG2, Rivet, n8n, Composio, SWE-agent — see docs/ecosystem-watch.md) confirmed that nobody runs while(true) per agent. The working patterns are: (a) event-driven + hibernation (Hermes, Letta, Trigger.dev, Inngest) (b) cron/user-triggered ephemeral runs (AG2, Rivet, n8n, SWE-agent) Molecule AI is currently 100% in category (b). Observed team utilization: ~0.5% — agents idle 99.5% of the time because cron fires and CEO-typed A2A are the only initiating signals. CEO's north-star is 24/7 iteration, current cadence falls short. This PR closes the gap by adding an in-workspace idle loop that wakes the agent periodically ONLY when it has no active task. The shape is the Hermes reflection-on-completion pattern combined with the Letta backlog-pull pattern, collapsed into a ~60 LOC change in the workspace-template. Zero new Go code. Zero new DB tables. Zero new API endpoints. ## How it works 1. `config.py` gets two new fields on WorkspaceConfig: - `idle_prompt: str = ""` — the prompt to self-send when idle - `idle_interval_seconds: int = 600` — how often to check (default 10 min) Both support inline or file ref (matching the initial_prompt pattern). 2. `main.py` spawns an `_run_idle_loop()` asyncio task alongside the existing initial_prompt task (same lifecycle hooks — cancelled in the `finally:` of the server.serve() block). 3. The loop body: a. Sleep interval b. Check `heartbeat.active_tasks == 0` LOCALLY (no LLM call, no HTTP) c. If idle → self-POST the idle_prompt via the existing /workspaces/{id}/a2a proxy d. Loop The agent's own concurrency control rejects the post if it becomes busy between the check and the POST — that's the safety valve. 4. Gated on `config.idle_prompt` being non-empty. Default = "" = no loop. Existing workspaces upgrade silently as no-ops until someone explicitly opts in by setting idle_prompt in org.yaml (either defaults: or per-workspace:). ## Cost analysis (from the research report) - while(true) pattern: ~$93/day/org (12 agents × 12 thinks/hour × $0.027). Unshippable. - Hermes reflection-on-completion: ~$0.45/day/org. Cost ∝ useful work. - This PR's idle loop at 10-min cadence: upper bound 12 × 6/hour × 24h × ~3k tokens × Sonnet rate ≈ $5/day/org PER ROLE, only if they're genuinely idle every check. In practice far less because busy periods skip the LLM call entirely (the active_tasks check is local). ## Rollout plan Research report recommended rolling to ONE workspace first (Technical Researcher) and measuring 24h of activity_logs before enabling for all 12. This PR enables the mechanism; it does NOT add any default idle_prompt to org-templates/molecule-dev/org.yaml. That's a follow-up PR after this one lands and one workspace has been manually opted in for measurement. ## Not touched in this PR - No Go code (no new platform endpoint, no new DB columns) - No org.yaml changes (zero-impact until someone opts in) - No scheduler changes (the idle loop is a workspace concern, not a scheduler concern — matches the research report's layering) ## Test plan - [x] Python syntax check (ast.parse) on main.py + config.py - [ ] Unit test: WorkspaceConfig parses idle_prompt / idle_interval_seconds from yaml - [ ] Integration test: set idle_prompt on Technical Researcher, measure that an A2A message is received every ~10 min while idle, and NOT received while busy with a delegation - [ ] Dogfood: enable on Technical Researcher for 24h, count activity_logs delta vs baseline, confirm cost stays within model ## Related - Today's research report (conversation output, summarized in commit trailer) - docs/ecosystem-watch.md → `### Hermes Agent` (the canonical reflection-on-completion example) - #159 orchestrator/worker split — complementary: leaders pulse for dispatch, workers idle-loop for pull. Together: leaders push work, workers pull work, no role ever sits idle with a cold queue.	2026-04-15 11:09:43 -07:00
Hongming Wang	2f28384757	Merge pull request #203 from Molecule-AI/fix/issue-168-route-split fix(auth): #168 — CanvasOrBearer on PUT /canvas/viewport (route-split)	2026-04-15 11:09:22 -07:00
Hongming Wang	f0dcb81a24	fix(auth): #168 — CanvasOrBearer middleware for PUT /canvas/viewport only Closes #168 by the route-split path from #194's review. #167 put PUT /canvas/viewport behind strict AdminAuth, breaking canvas drag/zoom persist because the canvas uses session cookies not bearer tokens. New narrow middleware CanvasOrBearer: - Accepts a valid bearer (same contract as AdminAuth) OR - Accepts a request whose Origin exactly matches CORS_ORIGINS - Lazy-bootstrap fail-open preserved for fresh installs Applied ONLY to PUT /canvas/viewport. The softer check is acceptable there because viewport corruption is cosmetic-only — worst case a user refreshes the page. This middleware must NOT be used on routes that leak prompts (#165), create resources (#164), or write files (#190) — see #194 review for why. The other canvas-facing routes mentioned in #168 (Events tab, Bundle Export/Import) remain behind strict AdminAuth pending a proper session-cookie-accepting AdminAuth (#168 follow-up for Phase H). 6 new tests cover: bootstrap fail-open, no-creds 401, canvas origin match, wrong origin 401, empty origin rejected, localhost default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:09:16 -07:00
Hongming Wang	9a23180fa9	Merge pull request #198 from Molecule-AI/fix/a2a-compat-batch-173-174-175 fix(a2a): A2A protocol compliance — cancel(), capabilities, push store (closes #173 #174 #175)	2026-04-15 11:02:11 -07:00
Hongming Wang	d24d385a1b	Merge branch 'main' into fix/a2a-compat-batch-173-174-175	2026-04-15 11:01:54 -07:00
Hongming Wang	be3746ffc3	Merge pull request #200 from Molecule-AI/fix/issue-190-templates-import-auth fix(security): #190 — gate POST /templates/import behind AdminAuth	2026-04-15 11:00:54 -07:00
Hongming Wang	7c9192063d	fix(security): #190 — gate POST /templates/import behind AdminAuth Closes #190 (HIGH). The route was registered on the root router with no auth middleware, letting any unauthenticated caller write arbitrary files into configsDir via a crafted template. Same vulnerability class as #164 (bundles/import) and path-traversal risk same as #103 (org/import). One-line gate via the existing wsAdmin pattern. Lazy-bootstrap fail-open preserved for fresh installs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:00:49 -07:00
Hongming Wang	458c743ad6	Merge pull request #197 from Molecule-AI/fix/ci-python-bypass-setup-python fix(ci): apply bypass-setup-python to main (missed in #186 squash)	2026-04-15 10:58:27 -07:00
Hongming Wang	b2761ba568	fix(ci): apply user's bypass-setup-python to main (missed in #186 squash-merge) #186's squash-merge commit (`aa419477`) took 15e15a21 (AGENT_TOOLSDIRECTORY override) but missed a6cfc5f (bypass setup-python entirely) which was pushed to the PR branch after the merge was initiated. The merge commit still has the old setup-python@v5 job config. Applies a6cfc5f's ci.yml verbatim via git checkout. Restores the Homebrew-python3.11 bypass path that the user prototyped. No other changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:58:22 -07:00
Backend Engineer	1c07046332	fix(a2a): cancel() event, stateTransitionHistory capability, wire push store (#173 #174 #175 ) #173 — implement cancel() in LangGraphA2AExecutor: emits TaskStatusUpdateEvent(state=canceled, final=True) so clients see the state transition rather than silence. Removes pragma: no cover. Test: test_cancel_emits_canceled_event. #174 — add stateTransitionHistory=True to AgentCapabilities in main.py so microsoft/agent-framework clients know they can request full task history via the A2A protocol. #175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender into DefaultRequestHandler so the advertised pushNotifications capability is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk 0.3.25); import confirmed by probe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 17:58:10 +00:00
Hongming Wang	74046ca2cf	Merge pull request #187 from Molecule-AI/fix/issue-179-trusted-proxies fix(router): SetTrustedProxies(nil) closes rate-limit bypass via X-Forwarded-For (#179)	2026-04-15 10:55:01 -07:00

1 2 3 4 5 ...

302 Commits