Closes #N (issue to be filed)
Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
- doesn't work for remote workspaces (Phase 30 / Fly Machines)
- requires shell access on the host
- has no pagination
This PR adds:
1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
`{runtime, supported, lines, cursor, more, source}`. Default returns
`supported: false` so non-claude-code runtimes pass through gracefully.
2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
project dir name matches what Claude Code actually writes to. Limit
capped at 1000 to prevent OOM.
3. Workspace HTTP route `GET /transcript` — Starlette handler added
alongside the A2A app. Trusts the internal Docker network (same
model as POST / for A2A); Phase 30 remote-workspace auth is a
follow-up.
4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
workspace's URL, forwards GET, caps response at 1MB. Gated by
existing `WorkspaceAuth` middleware (same as /traces, /memories,
/delegations).
Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.
Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.
## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
Closes #N (issue to be filed)
Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
- doesn't work for remote workspaces (Phase 30 / Fly Machines)
- requires shell access on the host
- has no pagination
This PR adds:
1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
`{runtime, supported, lines, cursor, more, source}`. Default returns
`supported: false` so non-claude-code runtimes pass through gracefully.
2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
project dir name matches what Claude Code actually writes to. Limit
capped at 1000 to prevent OOM.
3. Workspace HTTP route `GET /transcript` — Starlette handler added
alongside the A2A app. Trusts the internal Docker network (same
model as POST / for A2A); Phase 30 remote-workspace auth is a
follow-up.
4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
workspace's URL, forwards GET, caps response at 1MB. Gated by
existing `WorkspaceAuth` middleware (same as /traces, /memories,
/delegations).
Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.
Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.
## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).
Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.
## What's new
- **`_history_to_openai_messages(user_message, history)`** (static) — maps
A2A `(role, text)` tuples to OpenAI Chat Completions
`[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
`ai`→`assistant`. Current turn appended as the final user message.
- **`_history_to_anthropic_messages`** (static) — identical wire shape to
OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
blocks will diverge here.
- **`_history_to_gemini_contents`** (static) — Gemini uses a different
shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
`parts=[{"text":...}]`. Delegates to none of the others.
- **`_do_openai_compat(user_message, history=None)`** — accepts history,
builds messages via `_history_to_openai_messages`. Back-compat: pass
`history=None` to get the old single-turn behavior.
- **`_do_anthropic_native(user_message, history=None)`** — same signature
change, calls `_history_to_anthropic_messages`. Still uses
`anthropic.AsyncAnthropic().messages.create()`, just with proper
multi-turn.
- **`_do_gemini_native(user_message, history=None)`** — same pattern,
calls `_history_to_gemini_contents`, passes to Gemini's
`generate_content(contents=...)`.
- **`_do_inference(user_message, history=None)`** — new signature,
dispatches by auth_scheme as before, passes both args through.
- **`execute()`** — no longer calls `build_task_text`. Calls
`extract_history(context)` directly and forwards to `_do_inference`.
Removes the `build_task_text` import (not needed in this file anymore).
## Tests
Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.
5 NEW tests:
- `test_history_to_openai_messages_empty_history` — empty history degrades
to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
forwards history to the chosen provider path
All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
41 passed in 0.07s
## Back-compat
- No public API changes to `create_executor()`. Callers that hit
`execute()` via A2A get the new multi-turn behavior automatically via
`extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
bypasses it now.
## What's NOT in this PR (Phase 2d)
- Tool calling / function calling on native paths (anthropic `tools=`,
gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
{type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
`system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
thinking config
Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.
## Related
- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).
Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.
## What's new
- **`_history_to_openai_messages(user_message, history)`** (static) — maps
A2A `(role, text)` tuples to OpenAI Chat Completions
`[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
`ai`→`assistant`. Current turn appended as the final user message.
- **`_history_to_anthropic_messages`** (static) — identical wire shape to
OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
blocks will diverge here.
- **`_history_to_gemini_contents`** (static) — Gemini uses a different
shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
`parts=[{"text":...}]`. Delegates to none of the others.
- **`_do_openai_compat(user_message, history=None)`** — accepts history,
builds messages via `_history_to_openai_messages`. Back-compat: pass
`history=None` to get the old single-turn behavior.
- **`_do_anthropic_native(user_message, history=None)`** — same signature
change, calls `_history_to_anthropic_messages`. Still uses
`anthropic.AsyncAnthropic().messages.create()`, just with proper
multi-turn.
- **`_do_gemini_native(user_message, history=None)`** — same pattern,
calls `_history_to_gemini_contents`, passes to Gemini's
`generate_content(contents=...)`.
- **`_do_inference(user_message, history=None)`** — new signature,
dispatches by auth_scheme as before, passes both args through.
- **`execute()`** — no longer calls `build_task_text`. Calls
`extract_history(context)` directly and forwards to `_do_inference`.
Removes the `build_task_text` import (not needed in this file anymore).
## Tests
Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.
5 NEW tests:
- `test_history_to_openai_messages_empty_history` — empty history degrades
to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
forwards history to the chosen provider path
All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
41 passed in 0.07s
## Back-compat
- No public API changes to `create_executor()`. Callers that hit
`execute()` via A2A get the new multi-turn behavior automatically via
`extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
bypasses it now.
## What's NOT in this PR (Phase 2d)
- Tool calling / function calling on native paths (anthropic `tools=`,
gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
{type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
`system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
thinking config
Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.
## Related
- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
Closes#256. Per CEO direction, shipping three separate opt-in plugins
instead of one bundled "compliance-posture" — keeps installs granular
so a workspace that only wants CVE scanning doesn't carry OWASP policy
or append-only audit retention.
- plugins/molecule-compliance/ — wraps compliance.py (OWASP OA-01
prompt injection + OA-03 excessive agency). Skill: owasp-agentic.
- plugins/molecule-audit/ — wraps audit.py (EU AI Act Art.
12/13/17 append-only JSONL log, SIEM-friendly). Skill: ai-act-audit-log.
- plugins/molecule-security-scan/ — wraps security_scan.py (Snyk or
pip-audit CVE gate on skill requirements.txt). Skill: skill-cve-gate.
Each plugin ships a manifest + one SKILL.md with:
- When to install / when to skip
- Configuration shape (config.yaml blocks)
- Anti-patterns to avoid
- Cross-references to the other two plugins so an operator can reason
about the full compliance surface
All three wrap code that already exists in workspace-template/builtin_tools/
— no Python changes. Install per workspace via
POST /workspaces/:id/plugins {"source":"builtin://molecule-<name>"}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#256. Per CEO direction, shipping three separate opt-in plugins
instead of one bundled "compliance-posture" — keeps installs granular
so a workspace that only wants CVE scanning doesn't carry OWASP policy
or append-only audit retention.
- plugins/molecule-compliance/ — wraps compliance.py (OWASP OA-01
prompt injection + OA-03 excessive agency). Skill: owasp-agentic.
- plugins/molecule-audit/ — wraps audit.py (EU AI Act Art.
12/13/17 append-only JSONL log, SIEM-friendly). Skill: ai-act-audit-log.
- plugins/molecule-security-scan/ — wraps security_scan.py (Snyk or
pip-audit CVE gate on skill requirements.txt). Skill: skill-cve-gate.
Each plugin ships a manifest + one SKILL.md with:
- When to install / when to skip
- Configuration shape (config.yaml blocks)
- Anti-patterns to avoid
- Cross-references to the other two plugins so an operator can reason
about the full compliance surface
All three wrap code that already exists in workspace-template/builtin_tools/
— no Python changes. Install per workspace via
POST /workspaces/:id/plugins {"source":"builtin://molecule-<name>"}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tick 32 (manual) merged a large batch of PRs — the test counts in
CLAUDE.md were drifting behind reality by enough to matter:
- platform: 816 → 818 (YAML injection fix + sanitizeRuntime allowlist)
- canvas: 453 → 482 (12 CookieConsent + 17 PricingTable/billing)
- workspace-template: 1180 → 1179 (Hermes Phase 2a/2b dispatch tests
landed but the test_hermes_providers env-var-leak fix removed a
fragile flake-path count; net -1)
This is measured not guessed: running the full suites on fresh main.
Not in this sync but worth mentioning for the next retrospective:
- controlplane repo received the full GDPR/admin/usage/consent/email
stack (#29-#34) — that work sits in molecule-controlplane, not
monorepo CLAUDE.md
- monorepo picked up /pricing route, cookie consent banner, molecule-
hitl plugin (#262), Hermes Phase 2a native Anthropic + 2b Gemini
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tick 32 (manual) merged a large batch of PRs — the test counts in
CLAUDE.md were drifting behind reality by enough to matter:
- platform: 816 → 818 (YAML injection fix + sanitizeRuntime allowlist)
- canvas: 453 → 482 (12 CookieConsent + 17 PricingTable/billing)
- workspace-template: 1180 → 1179 (Hermes Phase 2a/2b dispatch tests
landed but the test_hermes_providers env-var-leak fix removed a
fragile flake-path count; net -1)
This is measured not guessed: running the full suites on fresh main.
Not in this sync but worth mentioning for the next retrospective:
- controlplane repo received the full GDPR/admin/usage/consent/email
stack (#29-#34) — that work sits in molecule-controlplane, not
monorepo CLAUDE.md
- monorepo picked up /pricing route, cookie consent banner, molecule-
hitl plugin (#262), Hermes Phase 2a native Anthropic + 2b Gemini
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#257. Thin manifest + skill doc that activates the existing
builtin_tools/hitl.py primitives as a per-workspace opt-in plugin.
The Python implementation (@requires_approval decorator, pause_task /
resume_task tools, multi-channel notification, RBAC bypass roles) is
already in every runtime image — this plugin is the policy layer that
tells agents *when* to call them.
- plugins/molecule-hitl/plugin.yaml — runtimes: langgraph, claude_code,
deepagents; skills: hitl-gates
- plugins/molecule-hitl/skills/hitl-gates/SKILL.md — documents the 5
classes of action that need a gate (deployment / irreversible FS /
public message / production mutation / cross-workspace destructive),
decorator pattern, pause/resume pattern, config shape, 4 anti-patterns,
5-step test plan
No Python code — all implementation already exists. Install per
workspace via POST /workspaces/:id/plugins.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#257. Thin manifest + skill doc that activates the existing
builtin_tools/hitl.py primitives as a per-workspace opt-in plugin.
The Python implementation (@requires_approval decorator, pause_task /
resume_task tools, multi-channel notification, RBAC bypass roles) is
already in every runtime image — this plugin is the policy layer that
tells agents *when* to call them.
- plugins/molecule-hitl/plugin.yaml — runtimes: langgraph, claude_code,
deepagents; skills: hitl-gates
- plugins/molecule-hitl/skills/hitl-gates/SKILL.md — documents the 5
classes of action that need a gate (deployment / irreversible FS /
public message / production mutation / cross-workspace destructive),
decorator pattern, pause/resume pattern, config shape, 4 anti-patterns,
5-step test plan
No Python code — all implementation already exists. Install per
workspace via POST /workspaces/:id/plugins.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-existing flaky test: when the full workspace-template suite ran in
collection order, test_hermes_smoke.py::test_create_executor_raises_
without_keys failed with "DID NOT RAISE ValueError". Failure only
surfaced when test_hermes_providers ran first.
Root cause: test_hermes_providers had an autouse fixture that used
monkeypatch.delenv on entry, but several tests in that file mutate
os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`),
bypassing monkeypatch. monkeypatch only tracks its own deltas, so on
fixture teardown the direct-mutation values stayed in os.environ.
HERMES_API_KEY leaked across file boundaries into test_hermes_smoke,
which then saw a key present when it expected absence.
Fix: replace monkeypatch-based fixture with pure snapshot/restore:
- Snapshot all provider env vars at entry
- Clear them
- yield (test runs, may mutate freely)
- try/finally restore the exact pre-test state
This is deterministic regardless of whether a test uses monkeypatch,
direct mutation, or neither. Also adds a comment documenting WHY we
switched away from monkeypatch so a future reviewer doesn't revert.
Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-existing flaky test: when the full workspace-template suite ran in
collection order, test_hermes_smoke.py::test_create_executor_raises_
without_keys failed with "DID NOT RAISE ValueError". Failure only
surfaced when test_hermes_providers ran first.
Root cause: test_hermes_providers had an autouse fixture that used
monkeypatch.delenv on entry, but several tests in that file mutate
os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`),
bypassing monkeypatch. monkeypatch only tracks its own deltas, so on
fixture teardown the direct-mutation values stayed in os.environ.
HERMES_API_KEY leaked across file boundaries into test_hermes_smoke,
which then saw a key present when it expected absence.
Fix: replace monkeypatch-based fixture with pure snapshot/restore:
- Snapshot all provider env vars at entry
- Clear them
- yield (test runs, may mutate freely)
- try/finally restore the exact pre-test state
This is deterministic regardless of whether a test uses monkeypatch,
direct mutation, or neither. Also adds a comment documenting WHY we
switched away from monkeypatch so a future reviewer doesn't revert.
Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a public /pricing route the apex + tenant canvas can both serve.
Three-tier plan cards (Free, Starter, Pro) with per-plan CTA buttons
that dispatch correctly regardless of the user's state:
Free → redirect to signup
Anonymous + paid → redirect to signup (Stripe opens post-auth)
Authed + paid → POST /cp/billing/checkout, redirect to Stripe URL
No tenant slug → inline error ("pick an org first")
Network failures → surfaced in an ARIA alert banner
Files:
- src/lib/billing.ts — plan metadata + startCheckout + openBillingPortal
wrappers over /cp/billing/{checkout,portal}
- src/components/PricingTable.tsx — client component, lazy session
probe on first CTA click (no probe for anonymous browsers)
- src/app/pricing/page.tsx — server-rendered shell with SEO metadata,
links to legal pages in the footer
- Tests: 10 billing helper tests + 9 PricingTable tests (17 total,
additional ones cover the plan-list canonical order)
Design notes:
- The pricing data (features + prices) is a static const in billing.ts,
not fetched from the API. Changing prices requires a deploy — which
we'd need to do anyway for tier definition changes.
- PLAN_ID 'starter' is flagged highlighted=true so the middle card gets
the 'Most popular' visual treatment. One source of truth; test locks it.
- Session probe is lazy (first CTA click, not mount) so anonymous
visitors don't generate a /cp/auth/me request just to read the page.
AuthGate interaction:
- On apex (no tenant slug), AuthGate passthrough — /pricing renders freely
- On tenant subdomain, AuthGate still bounces anonymous users to login
before reaching /pricing — this is the correct UX for the "I'm already
logged in and want to upgrade my own org" flow
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a public /pricing route the apex + tenant canvas can both serve.
Three-tier plan cards (Free, Starter, Pro) with per-plan CTA buttons
that dispatch correctly regardless of the user's state:
Free → redirect to signup
Anonymous + paid → redirect to signup (Stripe opens post-auth)
Authed + paid → POST /cp/billing/checkout, redirect to Stripe URL
No tenant slug → inline error ("pick an org first")
Network failures → surfaced in an ARIA alert banner
Files:
- src/lib/billing.ts — plan metadata + startCheckout + openBillingPortal
wrappers over /cp/billing/{checkout,portal}
- src/components/PricingTable.tsx — client component, lazy session
probe on first CTA click (no probe for anonymous browsers)
- src/app/pricing/page.tsx — server-rendered shell with SEO metadata,
links to legal pages in the footer
- Tests: 10 billing helper tests + 9 PricingTable tests (17 total,
additional ones cover the plan-list canonical order)
Design notes:
- The pricing data (features + prices) is a static const in billing.ts,
not fetched from the API. Changing prices requires a deploy — which
we'd need to do anyway for tier definition changes.
- PLAN_ID 'starter' is flagged highlighted=true so the middle card gets
the 'Most popular' visual treatment. One source of truth; test locks it.
- Session probe is lazy (first CTA click, not mount) so anonymous
visitors don't generate a /cp/auth/me request just to read the page.
AuthGate interaction:
- On apex (no tenant slug), AuthGate passthrough — /pricing renders freely
- On tenant subdomain, AuthGate still bounces anonymous users to login
before reaching /pricing — this is the correct UX for the "I'm already
logged in and want to upgrade my own org" flow
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the secret map with RESEND_API_KEY, RESEND_FROM_EMAIL,
STRIPE_API_KEY, STRIPE_WEBHOOK_SECRET — the four SaaS secrets the
control plane reads once the current PR stack (#29-#34 on
molecule-controlplane) ships.
Adds rotation procedures for each:
- Resend: low-blast-radius, best-effort sends, domain verification
gotcha documented
- Stripe API key: independent rotation from webhook secret, live verify
via /cp/billing/checkout
- Stripe webhook secret: 24h overlap window procedure using stripe
trigger for live verify
Also adds Resend + Stripe entries to the emergency-contacts list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the secret map with RESEND_API_KEY, RESEND_FROM_EMAIL,
STRIPE_API_KEY, STRIPE_WEBHOOK_SECRET — the four SaaS secrets the
control plane reads once the current PR stack (#29-#34 on
molecule-controlplane) ships.
Adds rotation procedures for each:
- Resend: low-blast-radius, best-effort sends, domain verification
gotcha documented
- Stripe API key: independent rotation from webhook secret, live verify
via /cp/billing/checkout
- Stripe webhook secret: 24h overlap window procedure using stripe
trigger for live verify
Also adds Resend + Stripe entries to the emergency-contacts list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.
## What's new in this PR
1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
update `base_url` from the OpenAI-compat endpoint
(`/v1beta/openai`) to the bare host
(`https://generativelanguage.googleapis.com`) which the native SDK
uses.
2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
`google.genai.Client().aio.models.generate_content(...)`. Dispatch
table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
Same fail-loud semantics as `_do_anthropic_native` — missing SDK
raises a clear RuntimeError with install instructions.
3. `requirements.txt`: add `google-genai>=1.0.0`.
4. `test_hermes_phase2_dispatch.py`: +3 tests
- `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
validated
- `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
gemini native, not openai-compat or anthropic-native
- `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
on missing `google-genai` package
Plus updated existing dispatch tests to mock `_do_gemini_native`
alongside the other paths so "no cross-calls" assertions stay tight.
All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
36 passed in 0.07s
## Dispatch table after this PR
auth_scheme="openai" → _do_openai_compat (13 providers)
auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a)
auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW
<unknown> → _do_openai_compat + warning (forward-compat)
## Back-compat
- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
automatically — no caller changes needed
## What's NOT in this PR (future phases)
- Streaming support (`astream_messages` / `streamGenerateContent` stream
variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path
Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.
## Stacking
This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close#240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.
## Related
- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
eco-watch entry that catalogued Hermes's native provider list and
shaped the original Phase 2 scope
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.
## What's new in this PR
1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
update `base_url` from the OpenAI-compat endpoint
(`/v1beta/openai`) to the bare host
(`https://generativelanguage.googleapis.com`) which the native SDK
uses.
2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
`google.genai.Client().aio.models.generate_content(...)`. Dispatch
table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
Same fail-loud semantics as `_do_anthropic_native` — missing SDK
raises a clear RuntimeError with install instructions.
3. `requirements.txt`: add `google-genai>=1.0.0`.
4. `test_hermes_phase2_dispatch.py`: +3 tests
- `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
validated
- `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
gemini native, not openai-compat or anthropic-native
- `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
on missing `google-genai` package
Plus updated existing dispatch tests to mock `_do_gemini_native`
alongside the other paths so "no cross-calls" assertions stay tight.
All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
36 passed in 0.07s
## Dispatch table after this PR
auth_scheme="openai" → _do_openai_compat (13 providers)
auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a)
auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW
<unknown> → _do_openai_compat + warning (forward-compat)
## Back-compat
- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
automatically — no caller changes needed
## What's NOT in this PR (future phases)
- Streaming support (`astream_messages` / `streamGenerateContent` stream
variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path
Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.
## Stacking
This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close#240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.
## Related
- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
eco-watch entry that catalogued Hermes's native provider list and
shaped the original Phase 2 scope
Closes#248. Three instances of the same YAML-injection bug class
(#221 name/role, #233 template path, #241 runtime/model) shipped in
this repo over the last weeks. The common root cause is the Security
Auditor's system prompt didn't list YAML injection as an explicit
check class, so audits missed the pattern every time.
Adds:
- "YAML injection" to the 'Think like an attacker' list in How You Work
- An explicit entry in What You Check with the three prior instances
cited so future auditors see the pattern and the fix shape
(double-quoted scalars or a proper YAML encoder)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#248. Three instances of the same YAML-injection bug class
(#221 name/role, #233 template path, #241 runtime/model) shipped in
this repo over the last weeks. The common root cause is the Security
Auditor's system prompt didn't list YAML injection as an explicit
check class, so audits missed the pattern every time.
Adds:
- "YAML injection" to the 'Think like an attacker' list in How You Work
- An explicit entry in What You Check with the three prior instances
cited so future auditors see the pattern and the fix shape
(double-quoted scalars or a proper YAML encoder)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#241 (MEDIUM, auth-gated by AdminAuth on POST /workspaces).
## Vectors closed
1. YAML injection via runtime: a crafted payload
`runtime: "langgraph\ninitial_prompt: run id && curl …"`
was splatted raw into config.yaml, smuggling an attacker-controlled
initial_prompt into the agent's startup config.
2. Path traversal oracle via runtime: the runtime string was joined
into filepath.Join for the runtime-default template fallback.
`runtime: ../../sensitive` could probe host directory existence.
3. YAML injection via model: same shape as runtime but via the
freeform model field.
## Fix
- New sanitizeRuntime(raw string) string allowlists 8 known runtimes
(langgraph/claude-code/openclaw/crewai/autogen/deepagents/hermes/codex);
unknown → collapses to langgraph with a warning log. Called at every
place the runtime is used: ensureDefaultConfig, workspace.go:175
runtimeDefault fallback, org.go:370 runtimeDefault fallback.
- New yamlQuote(s string) string helper that always emits a double-
quoted YAML scalar. name, role, and model now always go through it
instead of the ad-hoc "quote if contains special chars" logic that
was in place pre-#221. Removing the "sometimes quoted, sometimes not"
ambiguity simplifies reasoning about what survives from user input.
## Tests
- TestEnsureDefaultConfig_RejectsInjectedRuntime — parses the output
as YAML and asserts no top-level initial_prompt key survives
- TestEnsureDefaultConfig_QuotesInjectedModel — same YAML-parse test
for the model field
- TestSanitizeRuntime_Allowlist — 12 cases (8 valid runtimes + empty +
whitespace + unknown + path-traversal + newline-injection)
- Updated 6 existing TestEnsureDefaultConfig_* assertions to expect
the new always-quoted form (name: "Test Agent" vs name: Test Agent)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#241 (MEDIUM, auth-gated by AdminAuth on POST /workspaces).
## Vectors closed
1. YAML injection via runtime: a crafted payload
`runtime: "langgraph\ninitial_prompt: run id && curl …"`
was splatted raw into config.yaml, smuggling an attacker-controlled
initial_prompt into the agent's startup config.
2. Path traversal oracle via runtime: the runtime string was joined
into filepath.Join for the runtime-default template fallback.
`runtime: ../../sensitive` could probe host directory existence.
3. YAML injection via model: same shape as runtime but via the
freeform model field.
## Fix
- New sanitizeRuntime(raw string) string allowlists 8 known runtimes
(langgraph/claude-code/openclaw/crewai/autogen/deepagents/hermes/codex);
unknown → collapses to langgraph with a warning log. Called at every
place the runtime is used: ensureDefaultConfig, workspace.go:175
runtimeDefault fallback, org.go:370 runtimeDefault fallback.
- New yamlQuote(s string) string helper that always emits a double-
quoted YAML scalar. name, role, and model now always go through it
instead of the ad-hoc "quote if contains special chars" logic that
was in place pre-#221. Removing the "sometimes quoted, sometimes not"
ambiguity simplifies reasoning about what survives from user input.
## Tests
- TestEnsureDefaultConfig_RejectsInjectedRuntime — parses the output
as YAML and asserts no top-level initial_prompt key survives
- TestEnsureDefaultConfig_QuotesInjectedModel — same YAML-parse test
for the model field
- TestSanitizeRuntime_Allowlist — 12 cases (8 valid runtimes + empty +
whitespace + unknown + path-traversal + newline-injection)
- Updated 6 existing TestEnsureDefaultConfig_* assertions to expect
the new always-quoted form (name: "Test Agent" vs name: Test Agent)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#250 (MEDIUM). POST /channels/discover was on the open router
and accepted an arbitrary Telegram bot token, turning it into:
1. A free bot-token validity oracle — attackers can enumerate/probe
tokens at zero cost
2. A drive-by deleteWebhook side effect — every call invokes
tgbotapi.DeleteWebhookConfig against the target bot, breaking
legitimate webhook delivery
3. A rate-limit amplifier — getMe + deleteWebhook + getUpdates per call
Fix: one-line addition of middleware.AdminAuth(db.DB) to the route,
matching its actual intent (platform-operator admin helper, not a
per-workspace route). Pattern mirrors /admin/liveness, /events, and
/bundles/export from PR #167.
No new test: AdminAuth behavior is covered by
wsauth_middleware_test.go; this PR only wires it onto an additional
route. The load-bearing code comment references #250 so future
reviewers can't revert without an issue citation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#250 (MEDIUM). POST /channels/discover was on the open router
and accepted an arbitrary Telegram bot token, turning it into:
1. A free bot-token validity oracle — attackers can enumerate/probe
tokens at zero cost
2. A drive-by deleteWebhook side effect — every call invokes
tgbotapi.DeleteWebhookConfig against the target bot, breaking
legitimate webhook delivery
3. A rate-limit amplifier — getMe + deleteWebhook + getUpdates per call
Fix: one-line addition of middleware.AdminAuth(db.DB) to the route,
matching its actual intent (platform-operator admin helper, not a
per-workspace route). Pattern mirrors /admin/liveness, /events, and
/bundles/export from PR #167.
No new test: AdminAuth behavior is covered by
wsauth_middleware_test.go; this PR only wires it onto an additional
route. The load-bearing code comment references #250 so future
reviewers can't revert without an issue citation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>