Commit Graph

1070 Commits

Author SHA1 Message Date
Hongming Wang
b76f9dbcdb Merge pull request #242 from Molecule-AI/docs/gdpr-erasure-runbook
docs: GDPR Art. 17 erasure runbook
2026-04-15 13:49:28 -07:00
Hongming Wang
5d7deb9363 Merge pull request #260 from Molecule-AI/feat/pricing-page
feat(canvas): /pricing route with plan selector + Stripe checkout
2026-04-15 13:48:47 -07:00
Hongming Wang
4b865fa755 feat(canvas): /pricing route with plan selector + Stripe checkout
Adds a public /pricing route the apex + tenant canvas can both serve.
Three-tier plan cards (Free, Starter, Pro) with per-plan CTA buttons
that dispatch correctly regardless of the user's state:

  Free              → redirect to signup
  Anonymous + paid  → redirect to signup (Stripe opens post-auth)
  Authed + paid     → POST /cp/billing/checkout, redirect to Stripe URL
  No tenant slug    → inline error ("pick an org first")
  Network failures  → surfaced in an ARIA alert banner

Files:
- src/lib/billing.ts — plan metadata + startCheckout + openBillingPortal
  wrappers over /cp/billing/{checkout,portal}
- src/components/PricingTable.tsx — client component, lazy session
  probe on first CTA click (no probe for anonymous browsers)
- src/app/pricing/page.tsx — server-rendered shell with SEO metadata,
  links to legal pages in the footer
- Tests: 10 billing helper tests + 9 PricingTable tests (17 total,
  additional ones cover the plan-list canonical order)

Design notes:
- The pricing data (features + prices) is a static const in billing.ts,
  not fetched from the API. Changing prices requires a deploy — which
  we'd need to do anyway for tier definition changes.
- PLAN_ID 'starter' is flagged highlighted=true so the middle card gets
  the 'Most popular' visual treatment. One source of truth; test locks it.
- Session probe is lazy (first CTA click, not mount) so anonymous
  visitors don't generate a /cp/auth/me request just to read the page.

AuthGate interaction:
- On apex (no tenant slug), AuthGate passthrough — /pricing renders freely
- On tenant subdomain, AuthGate still bounces anonymous users to login
  before reaching /pricing — this is the correct UX for the "I'm already
  logged in and want to upgrade my own org" flow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:41:44 -07:00
Hongming Wang
7bfc40f2bd docs: add Resend + Stripe to saas-secrets runbook
Extends the secret map with RESEND_API_KEY, RESEND_FROM_EMAIL,
STRIPE_API_KEY, STRIPE_WEBHOOK_SECRET — the four SaaS secrets the
control plane reads once the current PR stack (#29-#34 on
molecule-controlplane) ships.

Adds rotation procedures for each:
- Resend: low-blast-radius, best-effort sends, domain verification
  gotcha documented
- Stripe API key: independent rotation from webhook secret, live verify
  via /cp/billing/checkout
- Stripe webhook secret: 24h overlap window procedure using stripe
  trigger for live verify

Also adds Resend + Stripe entries to the emergency-contacts list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:35:23 -07:00
rabbitblood
485dcb4cae feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.

## What's new in this PR

1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
   update `base_url` from the OpenAI-compat endpoint
   (`/v1beta/openai`) to the bare host
   (`https://generativelanguage.googleapis.com`) which the native SDK
   uses.

2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
   `google.genai.Client().aio.models.generate_content(...)`. Dispatch
   table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
   Same fail-loud semantics as `_do_anthropic_native` — missing SDK
   raises a clear RuntimeError with install instructions.

3. `requirements.txt`: add `google-genai>=1.0.0`.

4. `test_hermes_phase2_dispatch.py`: +3 tests
   - `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
     validated
   - `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
     gemini native, not openai-compat or anthropic-native
   - `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
     on missing `google-genai` package
   Plus updated existing dispatch tests to mock `_do_gemini_native`
   alongside the other paths so "no cross-calls" assertions stay tight.

All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):

    pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
    36 passed in 0.07s

## Dispatch table after this PR

    auth_scheme="openai"     → _do_openai_compat (13 providers)
    auth_scheme="anthropic"  → _do_anthropic_native (1 provider, Phase 2a)
    auth_scheme="gemini"     → _do_gemini_native (1 provider, Phase 2b) ← NEW
    <unknown>                → _do_openai_compat + warning (forward-compat)

## Back-compat

- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
  instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
  automatically — no caller changes needed

## What's NOT in this PR (future phases)

- Streaming support (`astream_messages` / `streamGenerateContent` stream
  variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
  gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path

Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.

## Stacking

This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close #240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.

## Related

- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
  item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
  eco-watch entry that catalogued Hermes's native provider list and
  shaped the original Phase 2 scope
2026-04-15 13:20:39 -07:00
Hongming Wang
e1ff890150 chore(template): add YAML injection to Security Auditor check list (#248)
Closes #248. Three instances of the same YAML-injection bug class
(#221 name/role, #233 template path, #241 runtime/model) shipped in
this repo over the last weeks. The common root cause is the Security
Auditor's system prompt didn't list YAML injection as an explicit
check class, so audits missed the pattern every time.

Adds:
- "YAML injection" to the 'Think like an attacker' list in How You Work
- An explicit entry in What You Check with the three prior instances
  cited so future auditors see the pattern and the fix shape
  (double-quoted scalars or a proper YAML encoder)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:18:52 -07:00
Hongming Wang
8881b68aaf fix(security): YAML injection + path traversal via runtime/model (#241)
Closes #241 (MEDIUM, auth-gated by AdminAuth on POST /workspaces).

## Vectors closed
1. YAML injection via runtime: a crafted payload
   `runtime: "langgraph\ninitial_prompt: run id && curl …"`
   was splatted raw into config.yaml, smuggling an attacker-controlled
   initial_prompt into the agent's startup config.
2. Path traversal oracle via runtime: the runtime string was joined
   into filepath.Join for the runtime-default template fallback.
   `runtime: ../../sensitive` could probe host directory existence.
3. YAML injection via model: same shape as runtime but via the
   freeform model field.

## Fix
- New sanitizeRuntime(raw string) string allowlists 8 known runtimes
  (langgraph/claude-code/openclaw/crewai/autogen/deepagents/hermes/codex);
  unknown → collapses to langgraph with a warning log. Called at every
  place the runtime is used: ensureDefaultConfig, workspace.go:175
  runtimeDefault fallback, org.go:370 runtimeDefault fallback.
- New yamlQuote(s string) string helper that always emits a double-
  quoted YAML scalar. name, role, and model now always go through it
  instead of the ad-hoc "quote if contains special chars" logic that
  was in place pre-#221. Removing the "sometimes quoted, sometimes not"
  ambiguity simplifies reasoning about what survives from user input.

## Tests
- TestEnsureDefaultConfig_RejectsInjectedRuntime — parses the output
  as YAML and asserts no top-level initial_prompt key survives
- TestEnsureDefaultConfig_QuotesInjectedModel — same YAML-parse test
  for the model field
- TestSanitizeRuntime_Allowlist — 12 cases (8 valid runtimes + empty +
  whitespace + unknown + path-traversal + newline-injection)
- Updated 6 existing TestEnsureDefaultConfig_* assertions to expect
  the new always-quoted form (name: "Test Agent" vs name: Test Agent)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:17:32 -07:00
Hongming Wang
81d5b658ad fix(security): gate /channels/discover behind AdminAuth (#250)
Closes #250 (MEDIUM). POST /channels/discover was on the open router
and accepted an arbitrary Telegram bot token, turning it into:
 1. A free bot-token validity oracle — attackers can enumerate/probe
    tokens at zero cost
 2. A drive-by deleteWebhook side effect — every call invokes
    tgbotapi.DeleteWebhookConfig against the target bot, breaking
    legitimate webhook delivery
 3. A rate-limit amplifier — getMe + deleteWebhook + getUpdates per call

Fix: one-line addition of middleware.AdminAuth(db.DB) to the route,
matching its actual intent (platform-operator admin helper, not a
per-workspace route). Pattern mirrors /admin/liveness, /events, and
/bundles/export from PR #167.

No new test: AdminAuth behavior is covered by
wsauth_middleware_test.go; this PR only wires it onto an additional
route. The load-bearing code comment references #250 so future
reviewers can't revert without an issue citation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:11:22 -07:00
Hongming Wang
0dd4f25952 feat(canvas): cookie consent banner with privacy-preserving default
Adds a GDPR/ePrivacy-compliant cookie banner to the canvas root layout.
Privacy-preserving default: no optional cookies are considered accepted
until the user clicks "Accept all". Clicking "Necessary only" or
dismissing records "rejected" and the banner does not re-appear until
the cookie-policy version bumps.

- New CookieConsent component wired into src/app/layout.tsx so it
  renders on every canvas route
- Persists decision to localStorage as {decision, decidedAt, version}
- Versioned schema: bumping CURRENT_VERSION re-prompts every user
- Exports hasConsent() helper for feature code that needs to gate
  analytics / functional cookies on user choice
- ARIA: role=dialog + aria-labelledby/aria-describedby so screen
  readers announce it as a dialog
- Same storage key + schema as the control-plane legal-page banner
  (see molecule-controlplane PR #XX) so a user who accepts on one
  surface does not re-see the banner on the other

Tests: 12 Vitest cases covering first-visit render, accept/reject
persistence, version re-prompt, invalid-JSON recovery, privacy link
attrs, ARIA markup, and the hasConsent helper under every state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:01:48 -07:00
Hongming Wang
9b82bce7ef docs: GDPR Art. 17 erasure runbook
Documents the 4-step hard-delete cascade implemented in
molecule-controlplane PR #29 (Stripe → Redis → Infra → DB rows),
how to read the org_purges audit table when a purge fails, the 30-day
GDPR deadline, and what the cascade deliberately does NOT cover
(WorkOS users, LLM provider history, Langfuse traces).

Cross-referenced from the "SaaS ops" block in CLAUDE.md so future
agents find it when handling erasure requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:42:16 -07:00
rabbitblood
3985d80220 feat(hermes): Phase 2a — native Anthropic Messages API dispatch path
Completes the Hermes adapter's native-SDK plan for the provider that gains
the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for
plain text turns on every provider (Phase 1 covered that with one code path
for all 15 providers), but Anthropic's Messages API has first-class tool use,
vision content blocks, and extended thinking that the OpenAI-compat shim
strips or mis-translates.

Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future),
this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a
production measurement window on Phase 2a.

## Design

Providers now dispatch by `auth_scheme` field. Phase 1 added the field but
every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"`
and wires a second inference path keyed on that:

- `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles
  14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen,
  GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral)
- `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the
  official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)`
- `HermesA2AExecutor._do_inference(task_text)` — dispatches by
  `self.provider_cfg.auth_scheme`

Unknown schemes fall back to OpenAI-compat with a logged warning, so future
provider additions don't crash if a native SDK path ships late.

## Fail-loud on missing SDK

`_do_anthropic_native` raises a clear `RuntimeError` with install
instructions if the `anthropic` package is missing at runtime:

    Hermes anthropic native path requires the `anthropic` package. Install
    in the workspace image with `pip install anthropic>=0.39.0` or set
    HERMES provider=openrouter to route Claude models through OpenRouter's
    OpenAI-compat shim instead.

This is intentional: silent fallback would mask fidelity loss (tool_use
blocks become plain text, vision gets stripped). Loud failure is better.

`requirements.txt` adds `anthropic>=0.39.0` so the package is baked into
the workspace-template image build path. Operators building custom workspace
images without anthropic installed get the loud error.

## Back-compat

- `create_executor(hermes_api_key="x")` → still routes to Nous Portal
  (`auth_scheme="openai"`), unchanged
- `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER
- `OPENROUTER_API_KEY` env var → still second
- All 14 OpenAI-compat providers unchanged — they take the same code path
  as before
- ONLY `anthropic` provider changes behavior: it now uses the native
  Messages API instead of the `/v1/chat/completions` compat shim

## Constructor signature change

`HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig`
instead of separate `api_key + base_url + model`. The three fields are
derived from `provider_cfg` + an optional model override. This is a
breaking change for any external caller building an executor directly,
but the only documented public entry point is `create_executor()`, which
is updated in the same commit to pass the cfg through.

## Test coverage

`workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests:

1. `test_anthropic_entry_has_anthropic_scheme` — registry flip
2. `test_all_other_providers_still_openai_scheme` — regression guard
3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path
4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path
5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat
6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud
7. `test_create_executor_passes_provider_cfg` — constructor wiring

All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s).
Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no
regressions.

## What's NOT in this PR (Phase 2b)

- Gemini native `generateContent` path (`auth_scheme="gemini"`)
- Streaming support across both native paths (`astream_messages`, `streamGenerateContent`)
- Tool calling on the anthropic native path (the `tools` + `tool_use` blocks)
- Vision content blocks (image_url → anthropic image blocks)
- Extended thinking parameter passthrough

All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum
viable native Anthropic dispatch — single-turn text in, text out, no tools.

## Related

- Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path
- Queued memory: `project_hermes_multi_provider.md` — full phased plan
- Triggering directive: CEO 2026-04-15 — "once current works are cleared,
  focus on supporting hermes agent"
2026-04-15 12:23:56 -07:00
Hongming Wang
fda2b56532 docs: sync CLAUDE.md + PLAN.md + edit-history with 2026-04-15 overnight sweep
Captures ~27 PRs merged across both repos this session: security
hardening cluster (#94/#99/#106/#110/#119/#162/#155/#167/#185/#200/#203/
#209/#233), data-integrity fixes (#212/#224/#236), CI runner migration
(#186), platform/scheduler reliability (#95/#149/#207/#206), workspace
runtime features (#205/#208/#198/#216/#225/#235/#231), code-review
follow-ups (#228/#232).

Updated counts: 816 Go (+70), 1180 Python (+40), 453 vitest (unchanged
— UI/a11y patches), 97 jest (unchanged).

CLAUDE.md additions:
- Idle Loop section (#205) under Architectural Patterns
- Admin auth middleware variants section linking docs/runbooks/admin-auth.md
- Migration runner section explaining the .down.sql filter (#212)
- Per-route auth notes in the API table (PATCH field-whitelist, CanvasOrBearer
  on PUT /canvas/viewport, AdminAuth on bundles/events/templates-import/
  approvals-pending/admin-liveness)
- Database section updated with workspace_auth_tokens auto-revoke (#110),
  scheduler.error_detail surfacing (#206), workspace_schedules.last_status
  'skipped' state (#207)

PLAN.md additions:
- New Recently launched (overnight sweep) section with full PR/issue index
- Phase status updated (B–G now complete, H partial)
- Live infrastructure deltas (migration fix, token rotation, legal pages)
- Outstanding items consolidated

Edit-history file expanded from the tick-9 stub to a full session record
covering malware cleanup, CI runner migration, security cluster, data
integrity, infra/feature/code-review batches, and outstanding user
actions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:16:24 -07:00
Hongming Wang
eb6796042b Merge pull request #236 from Molecule-AI/fix/issue-234-log-injection
fix(security): #234 — sanitize source_id spoof log line via %q
2026-04-15 12:04:32 -07:00
Hongming Wang
8efc06aca6 fix(security): #234 — sanitize source_id spoof log line via %q
Closes #234 LOW. The security log I added in PR #228 (code-review
follow-up) echoed body.SourceID with %s, which preserves any \n / \r
that json.Unmarshal decoded from the attacker's JSON. An authenticated
workspace could have injected fake log entries by sending
source_id="evil\ntimestamp=FORGED level=INFO msg=fake".

Fix: use %q on both body_source_id and c.ClientIP(). Go-quoted string
escapes all control characters so multi-line payloads stay on a single
log line. One-line fix.

Regression test: TestActivityHandler_Report_SourceIDLogInjection
exercises the code path with a literal \n in source_id. Assertion is
limited to "handler returns 403 cleanly with no panic" because
capturing log output in Go tests requires a log.SetOutput swap, which
adds noise for little signal vs just reading the test log output
(visible when running with -v).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:04:26 -07:00
Hongming Wang
7d89fd4ea4 Merge pull request #235 from Molecule-AI/fix/issue-220-initial-idle-prompt-auth
fix(workspace-template): #220 — auth_headers on initial_prompt + idle loop
2026-04-15 12:02:06 -07:00
Hongming Wang
1c41c30310 fix(workspace-template): #220 — send auth_headers() on initial_prompt + idle loop
Closes #220. #215 added auth_headers() to /registry/register but missed
two other self-post paths from the same workspace container:

1. initial_prompt (_do_send_sync) — fires once on first boot after the
   A2A server is ready. Posts to /workspaces/:id/a2a via the platform
   proxy. Missing headers meant the initial prompt got silently
   dropped as 401 on any token-enrolled workspace.

2. idle loop (_post_sync) — fires every idle_interval_seconds while
   the workspace has no active task (#205 pattern). Same proxy path,
   same missing headers, same silent 401 in multi-tenant mode.

Both now build headers as
  {"Content-Type": "application/json", **auth_headers()}

auth_headers() returns {"Authorization": "Bearer <token>"} when
/auth-token.txt exists, empty dict otherwise (first boot before
register issues the token). The existing lazy-bootstrap fail-open
on the platform side covers the empty-dict case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:02:01 -07:00
Hongming Wang
533beb8da3 Merge pull request #233 from Molecule-AI/fix/issue-226-create-template-traversal
fix(security): #226 — gate POST /workspaces template against traversal
2026-04-15 12:00:32 -07:00
Hongming Wang
3d561b24ef fix(security): #226 — gate POST /workspaces template/runtime against traversal
Closes #226 MEDIUM. WorkspaceHandler.Create joined payload.Template
directly into filepath.Join(configsDir, template) without validating
it stayed inside configsDir. An attacker posting Template="../../etc"
would have the provisioner walk and mount arbitrary host directories
into the workspace container.

Same fix as #103 (POST /org/import): use the existing resolveInsideRoot
helper to reject absolute paths and any ".." that escapes the root.
Applied at both call sites in workspace.go:
  1. Synchronous runtime detection before DB insert — 400 on bad input
  2. Async provisioning goroutine — early return, logs the rejection
     (belt-and-suspenders; the create path already blocks)

No test added inline because the existing resolveInsideRoot suite
(org_path_test.go) already covers absolute / traversal / prefix-sibling
/ empty-path / deep-subpath cases. A duplicate test for the workspace
handler wouldn't add signal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:00:26 -07:00
Hongming Wang
07cd0a2dfa Merge pull request #224 from Molecule-AI/fix/issue-221-yaml-injection
fix(security): sanitize workspace name before YAML interpolation
2026-04-15 11:59:10 -07:00
Hongming Wang
dc5c4b9dfa Merge pull request #231 from Molecule-AI/fix/160-sdk-error-probe
fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr
2026-04-15 11:58:59 -07:00
Hongming Wang
6ebfefa64f Merge pull request #227 from Molecule-AI/test/issue-217-plugin-pipeline-tests
test(handlers): unit test suite for plugins_install_pipeline.go
2026-04-15 11:58:56 -07:00
Hongming Wang
edf72a80f8 Merge pull request #225 from Molecule-AI/fix/issue-215-register-auth
fix(workspace-template): add auth_headers() to /registry/register POST
2026-04-15 11:58:53 -07:00
Hongming Wang
f1899aa67f Merge pull request #216 from Molecule-AI/feat/tr-idle-prompt
chore(template): enable idle-loop pilot on Technical Researcher (#205 follow-up)
2026-04-15 11:58:50 -07:00
Hongming Wang
81d05bd7e3 Merge pull request #223 from Molecule-AI/fix/reno-stars-browser-automation-default
fix(reno-stars): default plugins to browser-automation
2026-04-15 11:58:46 -07:00
Hongming Wang
76bd2a2ccf fix(security): #221 — quote name as YAML scalar instead of stripping newlines
The original fix stripped \n/\r but left the rest in place, then relied
on a substring-based test which was over-strict (the escaped fragment
still contained the banned substring as bytes).

Better approach: emit the name as a double-quoted YAML scalar with all
escape sequences (\\, \", \n, \r, \t) handled inline. This is the
canonical YAML-safe way to embed user input — no injection possible
because every control character is either escaped or rejected by the
YAML parser inside the scalar context.

Test rewritten to parse the output as YAML and verify:
  1. parsed[\"name\"] equals the literal attacker input (payload preserved)
  2. no banned top-level keys leaked to the parsed map
  3. legitimate default keys (description/version/tier/model) still present

Updated the two existing tests that asserted the unquoted name format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:58:16 -07:00
Hongming Wang
a202db15e1 Merge branch 'main' into fix/160-sdk-error-probe 2026-04-15 11:54:13 -07:00
Hongming Wang
4f18442e6a Merge branch 'main' into test/issue-217-plugin-pipeline-tests 2026-04-15 11:54:12 -07:00
Hongming Wang
4f59fa1ede Merge branch 'main' into fix/issue-221-yaml-injection 2026-04-15 11:54:10 -07:00
Hongming Wang
715ecc2caf Merge branch 'main' into fix/issue-215-register-auth 2026-04-15 11:54:09 -07:00
Hongming Wang
975f55a560 Merge branch 'main' into feat/tr-idle-prompt 2026-04-15 11:54:08 -07:00
Hongming Wang
564f377d1b Merge branch 'main' into fix/reno-stars-browser-automation-default 2026-04-15 11:54:06 -07:00
Hongming Wang
daf1ce8cb5 Merge pull request #232 from Molecule-AI/fix/code-review-idle-loop-and-docs
fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook
2026-04-15 11:52:06 -07:00
Hongming Wang
54b49ffd1b fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook
Addresses items 4, 5, 7 from the self-review of the batch merge. PR A
(#228) covered items 1, 2, 3, 6 on the Go side.

## workspace-template/main.py — idle loop hardening

- Replace asyncio.get_event_loop() with asyncio.get_running_loop() —
  the former is deprecated in 3.12+ and emits a DeprecationWarning on
  every idle fire.
- Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS
  clamped to max(60, min(300, idle_interval_seconds)). Long cadence
  workspaces no longer hold dangling requests open for 10 minutes; the
  cap adapts automatically when the interval is short.
- Type the exception handling: split HTTPError (has .code) from URLError
  (connection-level) from the generic catch-all. Log status + error
  class separately so operators can grep for specific failure modes
  instead of a bare "post failed".
- Fire-and-forget no longer loses exceptions. run_in_executor Future
  now has an add_done_callback that logs the outcome, so a panic in
  _post_sync surfaces as "Idle loop: post failed — status=None err=..."
  instead of Python's default "Task exception was never retrieved"
  warning burried in stderr.

## org-templates/molecule-dev/org.yaml — discoverability

Added idle_prompt + idle_interval_seconds to the defaults: block with
explanatory comments. Without this, users had to read main.py to
discover the feature.

## docs/runbooks/admin-auth.md — new

Documents the three middleware variants (AdminAuth strict,
CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each,
and the three-question test for adding a new route to CanvasOrBearer.
Also flags the session-cookie follow-up as Phase H.

Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203,
#228.

No code deltas in platform/ beyond the Python + YAML + docs changes.
Full pytest suite unchanged except the pre-existing test_hermes_smoke
flake that fails in full-suite but passes in isolation (test isolation
bug, not introduced by this PR).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:52:01 -07:00
rabbitblood
1151265b72 fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr
Context: when the claude-agent-sdk wraps a stream error from the CLI
subprocess that it can't categorize (rate limit, auth, network), it
raises a bare `Exception("Command failed with exit code 1\nError output:
Check stderr output for details")`. The exception has no `.stderr` or
`.exit_code` attributes, so #66's `_format_process_error` — which reads
those attributes — has nothing to surface. The log line becomes:

    SDK agent error [claude-code]: Exception: Command failed with exit
    code 1 (exit code: 1)\nError output: Check stderr output for details

That's the placeholder text from the SDK's error path, not the actual
error. Operators chasing a stuck workspace are forced to `docker exec
ws-xxx claude --print` manually to discover the real cause. Observed
today during the rate-limit incident: every PM error line was identical
"Check stderr output for details" while the real cause ("You've hit
your limit · resets Apr 17, 11pm (UTC)") was only visible via manual
reproduction — that cost ~20 minutes of diagnosis time.

## Fix

Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs
`claude --print` with a small probe input, captures stderr+stdout, and
returns the real error string. Bounded by 30s timeout so a hung CLI
can't stall the error path.

Extend `_format_process_error` with ONE narrow fallback: if the
exception has no stderr/exit_code AND its message contains the specific
"Check stderr output for details" marker, call the probe and append
`probed_cli_error=<real error>` to the formatted line.

Critically: the probe only runs in the narrow case where we have
nothing else to log. If `.stderr` or `.exit_code` are present (the
normal ProcessError path from #66), the probe is skipped — no wasted
subprocess, no 30s latency on every error.

## Test coverage

`workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests:
- `test_format_process_error_probes_cli_when_stderr_swallowed` — the
  happy path: exception matches the marker, probe runs, result appears
  in the formatted line. Probe is monkeypatched so no subprocess spawns
  in the test.
- `test_format_process_error_does_not_probe_when_stderr_already_present` —
  negative: regular ProcessError with `.stderr` set does NOT trigger
  the probe (skip the wasted call).
- `test_format_process_error_does_not_probe_without_swallowed_marker` —
  negative: unrelated plain exceptions (e.g. RuntimeError) do NOT
  trigger the probe (so the common-case error path stays fast).

All 7 `_format_process_error` tests pass locally (4 existing + 3 new):
\`\`\`
pytest tests/test_claude_sdk_executor.py -k format_process_error
======================= 7 passed in 0.06s ========================
\`\`\`

## Impact

Next time the SDK swallows a real error (rate limit, auth failure,
network outage), the workspace log will contain the actual error string
alongside the generic placeholder:

    SDK agent error [claude-code]: Exception: Command failed with exit
    code 1 ... | probed_cli_error="You've hit your limit · resets Apr
    17, 11pm (UTC)"

Diagnosis time drops from "docker exec each ws, run claude --print,
read stderr" (~20 min) to "grep probed_cli_error in platform logs"
(~10 seconds).

Closes #160.
2026-04-15 11:50:55 -07:00
Hongming Wang
acb8721bbb Merge pull request #228 from Molecule-AI/fix/code-review-go-batch
fix(code-review): Go-side follow-ups from self-review batch
2026-04-15 11:48:30 -07:00
Hongming Wang
35705274c9 fix(code-review): CanvasOrBearer fall-through, scheduler short(), activity spoof log + 6 new tests
Addresses self-review of the 10-PR batch merged earlier this session.
Splits the follow-ups into this Go-side PR and a later Python/docs PR.

## Fixes

1. wsauth_middleware.go CanvasOrBearer — invalid bearer now hard-rejects
   with 401 instead of falling through to the Origin check. Previous code
   let an attacker with an expired token + matching Origin bypass auth.
   Empty bearer still falls through to the Origin path (the intended
   canvas path).

2. scheduler.go short() helper — extracts safe UUID prefix truncation.
   Pre-existing unsafe [:12] and [:8] slices would panic on workspace IDs
   shorter than the bound. #115's new skip path had the bounds check;
   the happy-path log lines did not. One helper, three call sites.

3. activity.go security-event log on source_id spoof — #209 added the
   403 but the attempt was invisible to any auditor cron. Stable
   greppable log line with authed_workspace, body_source_id, client IP.

## New tests

- TestShort_helper — bounds-safety regression guard for the helper
- TestRecordSkipped_writesSkippedStatus — #115 coverage gap, exercises
  UPDATE + INSERT via sqlmock
- TestRecordSkipped_shortWorkspaceIDNoPanic — short-ID crash regression
- TestActivityHandler_Report_SourceIDSpoofRejected — #209 403 path
- TestActivityHandler_Report_MatchingSourceIDAccepted — non-spoof path
- TestHistory_IncludesErrorDetail — #152 problem B coverage

go test -race ./... green locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:48:25 -07:00
Dev Lead Agent
96b1bb7630 test(handlers): add unit test suite for plugins_install_pipeline.go
The 13K-line plugins_install_pipeline.go had zero unit tests, making it
the highest-regression-risk file in the platform handlers package.

New test file covers all testable pure-function and integration paths
that do not require a live Docker daemon:

  validatePluginName (8 cases)
    - valid names, empty, forward slash, backslash, "..", embedded "..";
      path-traversal variants ("../etc", "../../secrets")

  dirSize (6 cases)
    - empty dir, single file, multiple files, nested subdirectory,
      exceeds limit (verifies error mentions "cap"), exactly at limit

  httpErr / newHTTPErr (3 cases)
    - Error() contains status code, all relevant HTTP codes preserved,
      errors.As unwraps through fmt.Errorf %w chains

  regexpEscapeForAwk (6 cases)
    - alphanumeric names unchanged, slash escaped, dot escaped, + escaped,
      full "# Plugin: name /" marker (space not escaped), backslash escaped

  streamDirAsTar (4 cases)
    - empty dir yields zero entries, single file round-trips content,
      nested directory preserves relative path, entries have no absolute
      or tempdir-leaking paths

  resolveAndStage via stubResolver (10 cases)
    - empty source → 400, unknown scheme → 400, happy path (result fields),
      staged dir cleaned on fetch error, ErrPluginNotFound → 404,
      DeadlineExceeded → 504, generic error → 502, resolver returns invalid
      name → 400, local:// path traversal → 400 (pre-Fetch validation)

stubResolver implements plugins.SourceResolver as an in-process test
double — no network, no filesystem side-effects beyond the staging tempdir
that resolveAndStage creates and cleans up.

Closes #217

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:47:25 +00:00
Dev Lead Agent
b8f810dd21 fix(workspace-template): include auth_headers() on /registry/register POST
The register call was missing headers=auth_headers(), so workspaces that
already have a persisted token (i.e. every restart after the first boot)
were sending an unauthenticated request. The platform's register handler
returns 401 for requests missing a valid bearer token once a token has
been issued, causing re-registration to fail on every restart.

Import auth_headers at the module level (alongside the existing save_token
inline import) and pass it to the httpx POST. auth_headers() returns {}
when no token is on file yet (first boot), so there is no regression for
fresh workspaces — the platform still issues a token on the 200 response
and save_token() persists it for all subsequent restarts.

Closes #215

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:44:53 +00:00
Dev Lead Agent
1c4649945c fix(security): sanitize body.Name before YAML interpolation in generateDefaultConfig
A crafted workspace name containing a newline (e.g. "x\nmodel: evil")
could inject arbitrary YAML keys into the auto-generated config.yaml.
Strip \n and \r from the name before interpolation. YAML key injection
requires a newline to start a new mapping entry; other characters such
as `:` are safe in unquoted scalar values.

Adds TestGenerateDefaultConfig_YAMLInjection with three adversarial
inputs: bare \n injection, CRLF injection, and multi-key injection.

Closes #221

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:44:11 +00:00
airenostars
2aa1fc990d fix(reno-stars): default plugins to browser-automation
Every agent in the reno-stars org (marketing, sales, dev, coordinator)
plausibly needs browser access at some point — social posts, GBP edits,
directory submissions, InvoiceSimple publish. Without the plugin on
first import, agents fall back to launching their own Chromium inside
the container, which doesn't have the operator's authenticated Chrome
profile (no logged-in sessions, no saved cookies).

Per-agent opt-out via `!browser-automation` is already supported
(PR #71 UNION merge semantics) if any specific role shouldn't have it.

Closes #213
2026-04-15 11:43:48 -07:00
rabbitblood
9ceaea33ce chore(template): enable idle-loop pilot on Technical Researcher (#205 follow-up)
PR #205 shipped the workspace idle-loop mechanism (reflection-on-completion
pattern from the Hermes/Letta research survey) but deliberately added NO
default idle_prompt in org.yaml so rollout could be measured one workspace
at a time before going team-wide.

This is that first opt-in: Technical Researcher gets a backlog-pull + reflect
idle prompt on a 10-minute cadence.

## Why TR first

- Research-heavy role with a naturally bursty load — lots of idle time
  between the once-per-hour plugin curation cron fires
- Non-user-facing (no canvas UI impact, no UX risk)
- Already has a clear backlog shape: the plugin curation cron produces
  findings that could feed follow-up studies
- Vision-free (no Playwright) so cost per idle tick is pure text

## What the idle_prompt does

Three-step reflection, under 60s wall-clock, max 1 A2A send per tick:

1. **Backlog pull** — search_memory "research-backlog:technical-researcher"
   for any stashed research questions (from prior cron fires or Research
   Lead delegations). If found → delegate_task to Research Lead with a
   concrete deliverable spec, then commit_memory to remove the item from
   the backlog.

2. **Reflection fallback** — if backlog is empty, look at the last memory
   entry from the Hourly plugin curation cron. Does it surface a follow-up
   study worth doing? If yes → file a GH issue labeled `research` and
   commit_memory to put the question on the backlog for next tick.

3. **Idle-clean outcome** — if neither backlog nor reflection produced
   anything, write "tr-idle HH:MM — clean" to memory and stop. No busy work.

Hard rules enforce: max 1 A2A per tick, skip step 1 if Research Lead busy,
under 60s wall-clock, never re-run a cron's own prompt from inside the idle
loop.

## Rollout plan

- **This PR**: enables TR only via the `idle_prompt` + `idle_interval_seconds`
  fields added to its workspace entry in org.yaml.
- **Next 24h**: measure activity_logs delta on TR vs baseline, count
  idle-fired delegations vs idle-clean outcomes, confirm Research Lead
  isn't being flooded.
- **If green** (delegations land useful work, no flood): roll to Market
  Analyst + Competitive Intelligence in a follow-up PR.
- **If noisy** (too many idle fires producing nothing): tune idle_interval
  up to 1200-1800s.

## Apply locally per feedback rule

Per `feedback_apply_template_locally_too.md`: not waiting for merge. After
pushing this PR I'll edit TR's live /configs/config.yaml to add the same
idle_prompt + idle_interval_seconds fields, then restart ws-57e13b54-119
(Technical Researcher) so the new workspace-template binary picks up the
idle loop immediately. Measurement clock starts from that restart.

## Related
- #205 (mechanism) — just merged in this cycle (7f11328)
- #208 Hermes Phase 1 — also just merged (be53a33)
- docs/ecosystem-watch.md → `### Hermes Agent` — reflection-on-completion
  pattern reference
2026-04-15 11:34:51 -07:00
Hongming Wang
5ce848367e Merge pull request #212 from Molecule-AI/fix/issue-211-migration-runner-skips-down
fix(db): #211 — migration runner skips *.down.sql (stop wiping data on boot)
2026-04-15 11:24:11 -07:00
Hongming Wang
0b627816ed fix(db): #211 — migration runner skips *.down.sql (stop wiping data on boot)
Closes #211 HIGH ops/security. RunMigrations globbed \`*.sql\` which
matches both \`.up.sql\` AND \`.down.sql\`. Alphabetical sort puts \"d\"
before \"u\", so every platform boot ran the rollback BEFORE the forward
migration for any pair starting with migration 018.

Net effect: every restart wiped workspace_auth_tokens (the 020 pair),
which in turn regressed AdminAuth to its fail-open bootstrap bypass for
every route protected by it — the live server was effectively
unauthenticated from restart until the next workspace re-registered.
Also wiped 018_secrets_encryption_version and 019_workspace_access
pairs silently.

Fix is a 3-line filter: skip files whose base name ends in \`.down.sql\`.
Down migrations remain on disk for operator-driven rollback via psql,
but are never picked up by the auto-run loop.

Added unit test against a tmp dir to lock the filter behaviour so this
can never regress: stages a mix of legacy plain .sql, matched up/down
pairs, asserts only forward files survive.

Follow-up (not in this PR): the runner still re-applies every migration
on every boot. Migrations must be idempotent. A proper schema_migrations
tracking table is tracked as a future cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:24:06 -07:00
Hongming Wang
7f11328e22 Merge pull request #205 from Molecule-AI/feat/workspace-idle-loop
feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape, opt-in, ~90 LOC)
2026-04-15 11:21:47 -07:00
Hongming Wang
8a011c9f51 Merge remote-tracking branch 'origin/main' into feat/workspace-idle-loop 2026-04-15 11:21:15 -07:00
Hongming Wang
be53a33546 Merge pull request #208 from Molecule-AI/feat/hermes-phase1-provider-registry
feat(hermes): Phase 1 — multi-provider registry (15 providers, 26 tests, back-compat preserved)
2026-04-15 11:21:05 -07:00
Hongming Wang
80ae2bd6ad Merge remote-tracking branch 'origin/main' into feat/hermes-phase1-provider-registry 2026-04-15 11:20:51 -07:00
Hongming Wang
c2fee56d59 Merge branch 'main' into feat/hermes-phase1-provider-registry 2026-04-15 11:20:06 -07:00
Hongming Wang
2b8aef5c60 Merge pull request #210 from Molecule-AI/fix/issue-204-push-sender-abstract
fix(workspace-template): #204 — drop PushNotificationSender (abstract class)
2026-04-15 11:18:57 -07:00
Hongming Wang
7d7d5995e0 fix(workspace-template): #204 — drop PushNotificationSender (abstract class)
Closes #204. PR #198 wired push_sender=PushNotificationSender() into
DefaultRequestHandler to satisfy #175's push-notification capability,
but PushNotificationSender in a2a-sdk is an abstract base class and
cannot be instantiated. Every workspace container crashed on startup
with TypeError.

Reverted to DefaultRequestHandler's defaults. The pushNotifications
capability still appears in AgentCard.capabilities (advertised to A2A
clients) but actual implementation of the sender is deferred to a
Phase-H follow-up that subclasses PushNotificationSender properly.

Existing pytest suite unchanged (the crash was only at runtime on
main.py import, which no existing test exercises directly).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:18:52 -07:00