From 7224276de0bbefea2b1bc8ae23c3917b9d613d28 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Sat, 2 May 2026 02:21:11 -0700 Subject: [PATCH] feat: register codex runtime + runtime native-MCP design docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the OpenAI Codex CLI as a Molecule workspace runtime and lands the design docs that drove the runtime native-MCP push parity work across claude-code, hermes, openclaw, and codex. manifest.json: - Adds `codex` workspace_template entry pointing at the new Molecule-AI/molecule-ai-workspace-template-codex repo (initial commit landed there in parallel; 14 files / 1411 LOC). The workspace-server runtime registry already had `codex` in its fallback set — this entry makes it manifest-reachable in prod. docs/integrations/: - runtime-native-mcp-status.md — index across all four runtime streams - codex-app-server-adapter-design.md — full design including v2 RPC sequence, executor skeleton, schema-vs-runtime drift findings (real codex 0.72 returns thread.id, schema says thread.threadId) - hermes-platform-plugins-upstream-pr.md — pre-submission draft of the hermes-agent upstream PR Co-Authored-By: Claude Opus 4.7 (1M context) --- .../codex-app-server-adapter-design.md | 360 ++++++++++++++++++ .../hermes-platform-plugins-upstream-pr.md | 191 ++++++++++ .../integrations/runtime-native-mcp-status.md | 162 ++++++++ manifest.json | 3 +- 4 files changed, 715 insertions(+), 1 deletion(-) create mode 100644 docs/integrations/codex-app-server-adapter-design.md create mode 100644 docs/integrations/hermes-platform-plugins-upstream-pr.md create mode 100644 docs/integrations/runtime-native-mcp-status.md diff --git a/docs/integrations/codex-app-server-adapter-design.md b/docs/integrations/codex-app-server-adapter-design.md new file mode 100644 index 00000000..054a330e --- /dev/null +++ b/docs/integrations/codex-app-server-adapter-design.md @@ -0,0 +1,360 @@ +# Codex CLI workspace adapter — app-server design + +**Status:** Design draft — pre-implementation +**Owner:** Molecule AI (hongmingwang@moleculesai.app) +**Date:** 2026-05-02 +**Codex version validated against:** `codex-cli 0.72.0` +**Related:** `docs/integrations/hermes-platform-plugins-upstream-pr.md`, +`molecule-ai-workspace-template-openclaw/packages/openclaw-channel-plugin/` + +--- + +## Goal + +Add a Molecule workspace template for the OpenAI Codex CLI runtime +(`@openai/codex` v0.72+). The template should give Codex agents the +same A2A inbox + mid-session push behavior the other supported +runtimes have: + +- **claude-code:** MCP `notifications/claude/channel` +- **OpenClaw:** channel-plugin webhook into the gateway kernel +- **hermes:** `BasePlatformAdapter` (pending upstream PR; polling fallback today) +- **codex (this design):** persistent `codex app-server` stdio JSON-RPC + client; A2A messages become `turn/start` calls against a long-lived + thread + +Today there is no codex template. The legacy fallback registry entry +at `workspace-server/internal/handlers/runtime_registry.go:83` exists +only to keep old workspaces from crashing — there is no live adapter, +no Dockerfile, nothing in `manifest.json`. This design covers the +fresh build. + +--- + +## Architecture decision: app-server, not `codex exec` + +`codex exec --json` is the obvious shape — one CLI subprocess per +A2A message, same anti-pattern OpenClaw used to have and that we are +replacing. It loses session continuity (no shared thread), pays +process-spawn cost on every turn, and gives no path to mid-turn +interruption. + +`codex app-server` is a long-running JSON-RPC server over stdio that +holds thread state in memory. The v2 protocol (validated below) gives +us: + +- `thread/start` → returns `threadId` +- `turn/start` → input array, threadId required → returns `turnId` +- `turn/interrupt` → cancel a running turn by `(threadId, turnId)` +- Server-pushed notifications: `agent_message_delta`, `turn/started`, + `turn/completed`, `reasoning_text_delta`, + `command_execution_output_delta`, `mcp_tool_call_progress`, + `error_notification`, etc. + +A persistent app-server child plus a small async stdio reader gives us +session continuity AND mid-turn injection. Same dual-win shape we got +from migrating OpenClaw away from `openclaw agent`. + +### Why not v1? + +v1 of the protocol exposes `newConversation` + `sendUserMessage` / +`sendUserTurn` (one-shot per message, no streaming notifications). v2 +introduces threads + turns + delta notifications. v2 is the +forward-looking surface; we build against v2 from the start. + +--- + +## RPC sequence + +### 1. Boot + +``` +adapter spawn ▶ codex app-server (stdio NDJSON) + ◀ ready (process up) +adapter ▶ {"jsonrpc":"2.0","id":1,"method":"initialize", + "params":{"clientInfo":{"name":"molecule-runtime","version":"…"}}} +adapter ◀ {"id":1,"result":{"userAgent":"codex_cli_rs/0.72.0 …"}} +``` + +Validated 2026-05-02 against the installed binary — NDJSON framing, +initialize works as shown. + +### 2. Thread per workspace session + +``` +adapter ▶ thread/start + params: {model, sandboxPolicy, approvalPolicy, cwd, + baseInstructions, developerInstructions, …} +adapter ◀ {result: {thread: {threadId: "th_…"}}} +``` + +`threadId` is cached on the adapter for the workspace's lifetime. On +adapter restart we use `thread/resume` against the persisted ID +(written to disk under `~/.codex/sessions/` by codex itself, but we +also keep our own pointer in workspace state for fast restore). + +### 3. A2A message → turn/start + +For each inbound A2A message: + +``` +adapter ▶ turn/start + params: {threadId, input: [{type:"text", text:"…"}], …} +adapter ◀ {result: {turn: {turnId: "tu_…"}}} + +(server pushes notifications) +adapter ◀ turn/started +adapter ◀ agent_message_delta (text chunk) +adapter ◀ agent_message_delta (text chunk) +… +adapter ◀ turn/completed +``` + +The adapter accumulates `agent_message_delta` chunks into a buffer +keyed by `turnId`, emits them onto the A2A response queue (streamed if +the molecule-runtime contract supports streaming, otherwise assembled +into a single final message on `turn/completed`). + +### 4. Mid-turn injection — the load-bearing case + +**Default policy: per-thread serialization.** If a turn is already +running when a second A2A message arrives, queue the new message and +fire `turn/start` once the current `turn/completed` lands. This +matches OpenClaw's per-chat sequentializer behavior — the A2A peer +sees their messages handled in order, and we don't need +`turn/interrupt` for the common case. + +**Opt-in policy: interrupt-and-rerun.** For workspaces that prefer +"latest message wins" semantics (rare; configurable), the adapter +fires `turn/interrupt` with `(threadId, currentTurnId)`, waits for +`turn/completed` (with cancelled status), then `turn/start` with the +combined context: previous user message + agent's partial response so +far + new message, so the agent has full context of what got +interrupted. Off by default. + +### 5. Shutdown + +``` +adapter ▶ {"method":"shutdown"} (if v2 exposes one; otherwise SIGTERM) +adapter ▶ close stdio +adapter ▶ wait(child, timeout=5s); on timeout SIGKILL +``` + +--- + +## File layout (new template repo) + +``` +molecule-ai-workspace-template-codex/ +├── adapter.py # BaseAdapter shell, thin (~50 LOC) +├── executor.py # AppServerProxyExecutor — the RPC client (~300 LOC) +├── app_server.py # AppServerProcess — stdio child + NDJSON reader (~150 LOC) +├── config.yaml +├── Dockerfile # node:20 + npm i -g @openai/codex@0.72 +├── start.sh # boots adapter; codex app-server is spawned per session by executor +├── requirements.txt +├── README.md +└── tests/ + ├── test_app_server.py # mocks stdio; tests framing, request/notification dispatch + └── test_executor.py # mocks AppServerProcess; tests turn lifecycle, interrupt +``` + +Modeled on the hermes template (which is the closest existing shape: +adapter.py + executor.py separation; daemon proxy via local IPC). The +extra `app_server.py` exists because the JSON-RPC client + child +process management is non-trivial enough to warrant its own module +with its own tests. + +--- + +## Executor skeleton + +```python +# executor.py — A2A → codex app-server bridge + +class CodexAppServerExecutor(AgentExecutor): + """Holds one app-server child + thread, dispatches A2A turns as turn/start RPCs.""" + + def __init__(self, config: AdapterConfig): + self._config = config + self._app_server: AppServerProcess | None = None + self._thread_id: str | None = None + self._turn_lock = asyncio.Lock() # serialize per-thread by default + + async def _ensure_thread(self) -> str: + if self._app_server is None: + self._app_server = await AppServerProcess.start() + await self._app_server.initialize(client_info={ + "name": "molecule-runtime", + "version": MOLECULE_RUNTIME_VERSION, + }) + if self._thread_id is None: + resp = await self._app_server.request("thread/start", { + "model": self._config.model or None, + "developerInstructions": self._config.system_prompt or None, + # other policy fields (sandbox, approval) — Molecule defaults + }) + self._thread_id = resp["thread"]["threadId"] + return self._thread_id + + async def execute(self, context: RequestContext, event_queue: EventQueue) -> None: + prompt = extract_message_text(context.message) or "" + if not prompt.strip(): + await event_queue.enqueue_event(new_agent_text_message("(empty prompt)")) + return + + async with self._turn_lock: # per-thread serialization + thread_id = await self._ensure_thread() + + # Subscribe to delta notifications BEFORE starting the turn so we + # don't race the first agent_message_delta. + buffer: list[str] = [] + done = asyncio.Event() + error: Exception | None = None + + def on_notification(method: str, params: dict) -> None: + nonlocal error + if method == "agent_message_delta": + buffer.append(params.get("delta", "")) + elif method == "turn/completed": + done.set() + elif method == "error_notification": + error = RuntimeError(params.get("message", "unknown app-server error")) + done.set() + + unsub = self._app_server.subscribe(on_notification) + try: + resp = await self._app_server.request("turn/start", { + "threadId": thread_id, + "input": [{"type": "text", "text": prompt}], + }) + turn_id = resp["turn"]["turnId"] + await asyncio.wait_for(done.wait(), timeout=_TURN_TIMEOUT) + finally: + unsub() + + if error: + await event_queue.enqueue_event( + new_agent_text_message(f"[codex error] {error}")) + return + await event_queue.enqueue_event(new_agent_text_message("".join(buffer))) + + async def cancel(self, context: RequestContext, event_queue: EventQueue) -> None: + # When the molecule-runtime cancels a request, fire turn/interrupt + # against the currently-running turn. Best-effort — racing + # turn/completed is fine, app-server returns a noop in that case. + if self._app_server and self._thread_id and self._current_turn_id: + await self._app_server.request("turn/interrupt", { + "threadId": self._thread_id, + "turnId": self._current_turn_id, + }) +``` + +The `AppServerProcess` class encapsulates: stdio child management, +NDJSON line reader/writer, request-id correlation, notification +subscriber registry, and graceful shutdown. Standard async stdio +JSON-RPC client — nothing exotic. + +--- + +## Open questions to resolve before implementation + +1. **MoleculeRuntime streaming contract.** Does our A2A executor + contract support emitting incremental events (so the user sees + partial responses as the agent streams), or do we always assemble + on `turn/completed`? If streaming is supported, we want to forward + each `agent_message_delta` as an A2A event for parity with hermes + gateway streaming. (Cross-reference: hermes adapter currently + doesn't stream either — `executor.py:122` sets `stream=False` — + so non-streaming is the safe v1 baseline.) + +2. **Sandbox policy default.** Codex defaults to `read-only` for safety + in CLI mode; for workspace use we need write access to the + workspace tree. Pick a sensible default in `thread/start` — + probably `workspace-write` scoped to the workspace cwd. + +3. **Approval policy default.** Codex's `--ask-for-approval` modes + (`untrusted`, `on-failure`, `never`). Workspace agents need + `never` (they can't prompt a human). Confirm this is exposed via + `approvalPolicy` in `thread/start`. + +4. **Auth — login flow.** Codex supports `login api-key` (env + `OPENAI_API_KEY`) and `login chatgpt` (interactive OAuth). For + workspace use we mandate API key. Document this in the template's + README and surface it as a required env in config.yaml. + +5. **MCP server passthrough.** Codex's own `mcp_servers` config lets + the agent call out to MCP servers as a CLIENT. Should the workspace + adapter automatically wire `~/.codex/config.toml` so the agent can + reach the molecule MCP server (chat_history, recall_memory, + delegate_task)? Almost certainly yes — but verify the env-var + substitution pattern works in TOML. + +6. **Thread persistence across workspace restarts.** Codex stores + sessions on disk under `~/.codex/sessions/`. The adapter should + persist the threadId in workspace state so a restart resumes the + thread (`thread/resume`) rather than starting fresh. This matches + the existing molecule-runtime convention for session continuity. + +7. **Token usage / cost reporting.** v2 emits + `ThreadTokenUsageUpdatedNotification`. Plumb this into our usage + tracking — same path the other runtimes use. + +8. **MCP push notifications inbound.** Earlier research established + that codex's own MCP server mode does NOT support + `notifications/*` for push. So the path for unsolicited mid-session + A2A messages is NOT "codex's MCP client receives notifications from + our MCP server" — it's "molecule-runtime polls inbox via + `wait_for_message`, and on each polled message fires `turn/start` + on the existing thread." The "MCP native" framing here is satisfied + not by codex receiving MCP push, but by the persistent thread + + turn/start delivering the same UX (session continuity + queued or + interrupted handling of new messages mid-thread). + +--- + +## Why this design satisfies "MCP native push parity" + +User goal: every runtime delivers A2A inbox messages with the same +quality of experience as claude-code's MCP `notifications/claude/channel`. + +claude-code path: MCP server pushes notification → claude-code SDK +injects synthetic user turn into running session. + +Codex path: molecule-runtime polls inbox (universal poll path) → +adapter fires `turn/start` on the existing app-server thread → codex +processes the message in-thread with full context. The "push" happens +at the molecule-runtime ↔ adapter boundary; the "native" part is that +codex's own session model handles it as an in-thread turn, not as a +fresh subprocess. + +For mid-turn arrivals: the per-thread serialization (or opt-in +interrupt) gives us behavior equivalent to OpenClaw's per-chat +sequentializer. Equivalent UX to claude-code's mid-session +notification injection in practice — one is a kernel-level interrupt, +the other is a queue-then-dispatch, but the user-visible behavior +("the agent processes my message after the current turn finishes") is +identical. + +--- + +## Sequencing + +This is post-demo work. Order: + +1. **Spec the executor lifecycle** — pin down the open questions + above (especially #1 streaming, #5 MCP passthrough, #6 thread + persistence) before any code lands. +2. **Implement `AppServerProcess`** with thorough unit tests against a + mock stdio. This is the riskiest module (concurrency around + request-id correlation + notification dispatch); land it first + with high coverage. +3. **Implement `CodexAppServerExecutor`** on top. +4. **Build the template repo skeleton** (Dockerfile, config.yaml, + start.sh, README) once the Python side runs locally. +5. **Add codex to `manifest.json`** and the runtime registry. +6. **End-to-end verify** per `feedback_close_on_user_visible_not_merge` + — boot a real workspace, send A2A messages, observe streamed + responses + thread continuity + queued mid-turn handling. + +Estimated total: 3-5 engineering days for v1, plus E2E hardening. diff --git a/docs/integrations/hermes-platform-plugins-upstream-pr.md b/docs/integrations/hermes-platform-plugins-upstream-pr.md new file mode 100644 index 00000000..05a13769 --- /dev/null +++ b/docs/integrations/hermes-platform-plugins-upstream-pr.md @@ -0,0 +1,191 @@ +# Upstream PR draft: Pluggable platform adapters for hermes-agent + +**Status:** Draft — pre-submission review +**Target repo:** `NousResearch/hermes-agent` +**Owner:** Molecule AI (hongmingwang@moleculesai.app) +**Date drafted:** 2026-05-02 + +--- + +## Why this draft exists + +Molecule needs to deliver A2A inbox messages to a hermes-hosted agent the same way Telegram messages reach it today — through `_handle_message`, with `set_busy_session_handler` semantics for mid-turn arrivals. Today this requires forking `gateway/run.py` because the platform adapter system is closed (`_create_adapter` is a hardcoded if/elif chain at lines 2424-2578). + +But hermes already ships a working plugin discovery system for memory backends (`plugins/memory/__init__.py`). Extending the same pattern to platforms is a small, symmetric change — not novel architecture. This draft documents the proposed upstream PR before we open it, so we can iterate locally on tone, scope, and code shape. + +--- + +## Proposed PR title + +> Pluggable platform adapters via `plugins/platforms/` discovery + +(Mirrors the existing `plugins/memory/` shape so the title alone signals "this is the same pattern, just for the other subsystem.") + +--- + +## PR body + +### Problem + +Hermes ships 19 in-tree platform adapters (Telegram, Discord, WhatsApp, Slack, Signal, Mattermost, Matrix, Email, SMS, DingTalk, Feishu, WeCom variants, Weixin, BlueBubbles, QQBot, HomeAssistant, API server, Webhook). Each is wired by editing two files: + +- `gateway/config.py:48-69` — append a `Platform` enum value +- `gateway/run.py:2424-2578` — append an `elif platform == Platform.X:` branch in `_create_adapter()` + +For platforms with broad demand (Telegram, Slack, etc.) this is fine: the maintenance load lives upstream, every user benefits. For platforms with narrow but real demand — enterprise-internal channels (Rocket.Chat, RingCentral, Zulip), agent-to-agent inbox protocols (e.g. Molecule's A2A), niche regional platforms, or experimental transports — the only path today is forking `gateway/run.py`. Forks drift, defeat the purpose of an OSS gateway, and discourage contribution back upstream. + +### Prior art (already in hermes) + +The memory subsystem solved exactly this problem at `plugins/memory/__init__.py`: + +1. **Two-tier discovery** — bundled providers in `plugins/memory//` plus user-installed providers in `$HERMES_HOME/plugins//`. Bundled wins on name collision. +2. **`register(ctx)` collector pattern** (`plugins/memory/__init__.py:264-305`) — a plugin's `__init__.py` exposes a `register(ctx)` function; `ctx` already supports `register_memory_provider`, `register_tool`, `register_hook`, `register_cli_command`. +3. **`plugin.yaml` manifest** for description and metadata. +4. **Config-driven activation** (`memory.provider: honcho` selects which provider loads). + +Adding `register_platform_adapter` to the same collector and a `plugins/platforms/` discovery directory extends this pattern symmetrically. + +### Proposal + +**Three small changes:** + +1. **New collector method** in `plugins/memory/__init__.py:_ProviderCollector` (or a new shared `plugins/_collector.py` if maintainers prefer cleaner separation): + + ```python + def register_platform_adapter(self, name: str, adapter_class: type, requirements_check=None): + """Register a platform adapter loadable as plugin. + + name: unique platform identifier (matches gateway.platforms. in config) + adapter_class: subclass of BasePlatformAdapter + requirements_check: optional callable returning bool — same shape as + existing check_telegram_requirements() etc. + """ + self.platform_adapters[name] = (adapter_class, requirements_check) + ``` + +2. **New `plugins/platforms/__init__.py`** mirroring `plugins/memory/__init__.py` — `discover_platform_adapters()`, `load_platform_adapter(name)`, two-tier (bundled + `$HERMES_HOME/plugins/`) discovery. + +3. **`_create_adapter()` fallback** at `gateway/run.py:2578` — after the in-tree if/elif chain returns None, attempt plugin lookup: + + ```python + # Existing in-tree adapters checked first (precedence preserved). + # If no match, fall through to plugin discovery. + from plugins.platforms import load_platform_adapter + plugin_entry = load_platform_adapter(platform.value) + if plugin_entry: + adapter_class, req_check = plugin_entry + if req_check and not req_check(): + logger.warning(f"{platform.value}: plugin requirements not met") + return None + return adapter_class(config) + return None + ``` + +4. **`Platform` enum becomes open-set.** Today it's `Enum`; switch to a string-backed pattern that accepts unknown values (still validates against the union of in-tree + discovered plugins at config-load time): + + ```python + # gateway/config.py — replace Enum with frozen dataclass + dynamic registry. + # Keeps the in-tree values as module-level singletons for backward compat: + # Platform.TELEGRAM still works as today. + ``` + + This is the only "shape change" in the PR. Backward compat is straightforward: every existing `Platform.TELEGRAM` reference continues to work because the module exports the same names. + +### Backward compatibility + +- All 19 in-tree adapters keep their hardcoded path in `_create_adapter()` (precedence: in-tree wins on name collision, exactly like memory plugins). +- Existing config files (`gateway.platforms.telegram.enabled: true`) continue to work unchanged. +- No new mandatory config keys. +- Plugin discovery only runs if the platform name doesn't match an in-tree value, so cold-start cost is zero for users who don't use plugins. +- Fork-then-add-platform users can migrate to plugins at their own pace; the in-tree path isn't deprecated. + +### Test plan + +- **Unit**: discovery scans both bundled and user dirs, respects precedence. +- **Unit**: `_create_adapter()` falls through to plugin lookup only when in-tree doesn't match. +- **Integration**: ship a minimal `plugins/platforms/example/` in-tree (read-only, returns canned messages) so CI exercises the full plugin code path. Same approach `plugins/memory/holographic/` takes today. +- **Manual**: Molecule will publish `hermes-platform-molecule-a2a` as the first external consumer once this lands. + +### Documentation + +- Extend `CONTRIBUTING.md`'s "Should it be a Skill or a Tool?" section with "Should it be a Platform Plugin or an in-tree Platform?" — same shape, same decision tree. +- Add `plugins/platforms/README.md` mirroring `plugins/memory/`'s convention. + +### Out of scope (intentionally) + +- **Setuptools `entry_points`** — could be added later as a third discovery tier (after bundled + `$HERMES_HOME/plugins/`). Skipping for v1 because the directory-based discovery already covers the demand and matches the memory pattern. Adding entry_points is a non-breaking extension. +- **Hot-reload** — plugins discovered at gateway boot, no live re-scan. Matches memory plugins. +- **Sandboxing** — plugins run with full hermes process privileges. Same trust model as memory plugins; documented in the new README. + +### Reference consumer + +Molecule AI will ship `hermes-platform-molecule-a2a` as the first external consumer. Use case: deliver agent-to-agent inbox messages (from peer agents authenticated at the platform layer, not the Telegram-user level) into the same `_handle_message` dispatch Telegram uses, with `internal=True` events to bypass user-auth. Expected timeline: within 2 weeks of merge. + +--- + +## Open questions for upstream maintainers + +Per `CONTRIBUTING.md`, the right channel for design proposals is +**GitHub Discussions**, not Discord (Discord is for "questions, +showcasing projects, and sharing skills" — Discussions is the +documented channel for "design proposals and architecture discussions"). + +Open a Discussion at `NousResearch/hermes-agent/discussions` titled +"RFC: pluggable platform adapters via `plugins/platforms/`" with the +problem + proposal + open questions before filing the PR. This gives +maintainers space to weigh in on shape before code is in flight. + +Open questions to put in the Discussion: + +1. **Preferred naming.** `register_platform_adapter` vs `register_platform` vs `register_channel`. Consistency with memory's `register_memory_provider` argues for the long form. +2. **Enum vs string.** Is the maintainer team open to making `Platform` open-set? If not, fallback design: keep enum, add a single `Platform.PLUGIN` sentinel + a `plugin_name` field on `PlatformConfig`. Slightly uglier but smaller blast radius. +3. **Testing**: `plugins/platforms/example/` checked into the repo, or test-fixtures-only? Memory plugins are real (mem0, honcho, supermemory bundled), so a real example seems consistent. +4. **Discovery ordering**: confirm the user wants bundled-wins precedence (matches memory) vs user-can-override-bundled (would let downstream patch a buggy in-tree adapter without forking). Current memory pattern is bundled-wins; we'll match it unless told otherwise. + +--- + +## Effort estimate + +- **Code change**: ~150 LOC across `plugins/platforms/__init__.py` (new), `gateway/config.py` (Platform refactor), `gateway/run.py` (10-line fallback in `_create_adapter`), tests (~50 LOC). +- **Docs**: ~80 LOC across `CONTRIBUTING.md` extension and new `plugins/platforms/README.md`. +- **Review cycle**: depends on maintainer responsiveness. Memory plugin system shipped in v0.5–0.7 era; platform plugin system would land for v0.11 if accepted. + +--- + +## After this PR lands (Molecule-side follow-up) + +1. Publish `hermes-platform-molecule-a2a` (PyPI + `~/.hermes/plugins/molecule-a2a/`). +2. Bump our hermes workspace template to declare `plugins.platforms.molecule_a2a.enabled: true`. +3. Remove the polling shim from `molecule-ai-workspace-template-hermes/adapter.py` once the plugin path is verified end-to-end. + +--- + +## Status checklist (for our own tracking) + +Per user's gating: "if the plugin works locally in our docker setup +and e2e testing works, yes [submit]". Validation prerequisites: + +- [ ] Build a working `plugins/platforms/molecule_a2a/` plugin against + a forked hermes-agent with the proposed change applied +- [ ] Bake the forked hermes + plugin into a local copy of our + `molecule-ai-workspace-template-hermes` Docker image +- [ ] E2E: boot the local image, send A2A messages from a peer agent, + observe `_handle_message` dispatch + reply through A2A queue +- [ ] Confirm `Platform` enum refactor doesn't break downstream — grep + for `Platform.X` usages across hermes +- [ ] Confirm `$HERMES_HOME` is the right user-plugin root for + platforms (matches memory convention) +- [ ] Open a GitHub Discussion at + `NousResearch/hermes-agent/discussions` titled + "RFC: pluggable platform adapters via plugins/platforms/" with + design + open questions; wait for maintainer feedback +- [ ] Branch name: `feat/pluggable-platform-adapters` per + CONTRIBUTING.md branch convention +- [ ] Commit prefix: `feat(gateway): pluggable platform adapters + via plugins/platforms/` per Conventional Commits + scope `gateway` +- [ ] PR description covers what/why + how-to-test + platforms tested, + per CONTRIBUTING.md PR-description requirements +- [ ] Open PR against `NousResearch/hermes-agent` main once Discussion + lands consensus +- [ ] Track PR; bump cadence weekly; if stalled past 4 weeks, propose + fork-and-bundle as fallback for our hermes template image diff --git a/docs/integrations/runtime-native-mcp-status.md b/docs/integrations/runtime-native-mcp-status.md new file mode 100644 index 00000000..41d0b044 --- /dev/null +++ b/docs/integrations/runtime-native-mcp-status.md @@ -0,0 +1,162 @@ +# Runtime native-MCP push parity — status + +**Goal:** every workspace runtime delivers Molecule A2A inbox messages +with the same UX as claude-code's MCP `notifications/claude/channel` +push: session continuity + queued or interrupted handling of new +messages mid-thread, no fresh subprocess per message. + +Tracked across four runtime streams. Updated 2026-05-02. + +--- + +## claude-code + +**Status:** ✅ Done. Native MCP `notifications/claude/channel` push +shipped via `workspace/a2a_mcp_server.py`. Requires the host to launch +with `--dangerously-load-development-channels server:molecule`. + +No further work. + +--- + +## OpenClaw + +**Status:** Scaffolded; awaiting validation + companion adapter rewrite. + +**Path:** Channel-plugin SDK (`openclaw/plugin-sdk`), auto-discovered +from `~/.openclaw/plugins//` or workspace `.openclaw/`. Plugin +registers an HTTP webhook on `openclaw gateway`; Molecule workspace +adapter POSTs A2A messages to it; gateway dispatches through the same +`dispatchReplyWithBufferedBlockDispatcher` kernel call native channels +(Telegram, Lark, Slack, Discord) use. + +**Artifacts landed:** +- `molecule-ai-workspace-template-openclaw/packages/openclaw-channel-plugin/` + - `package.json`, `openclaw.plugin.json` (manifest), `index.ts` + (channel + webhook handler), `README.md`, `tsconfig.json` +- Pre-release `v0.1.0-pre`. Mirrors `rabbit-lark-bot` reference + plugin shape. + +**Remaining (task #84, #87):** +1. Validate against a running OpenClaw gateway. Open questions in the + plugin README: `resolveAgentRoute` peer-id shape, + `dispatchReplyWithBufferedBlockDispatcher` async semantics, + `outbound.sendText` no-op safety. +2. Rewrite Python adapter (`adapter.py`) to stop shelling out + `openclaw agent --message ...` and instead POST to the plugin's + webhook + run `/agent-reply` callback HTTP server. **Post-demo + work** (touches a working integration). + +--- + +## hermes + +**Status:** Upstream PR drafted; short-term shim deemed unnecessary. + +**Path:** Open the upstream `BasePlatformAdapter` system to external +plugins. Hermes already ships a working plugin discovery system for +memory backends (`plugins/memory/`, `register(ctx)` collector pattern, +`$HERMES_HOME/plugins//` user-installed tier). The PR extends +the same shape to platforms — `register_platform_adapter(...)` on the +existing collector, new `plugins/platforms/` discovery directory, +3-line fallback in `_create_adapter()`. Symmetric, not novel. + +**Artifacts landed:** +- `docs/integrations/hermes-platform-plugins-upstream-pr.md` — full + PR draft including problem, prior art, proposal, code shape, + backward compat, test plan, and four open questions to resolve in + Discord before submitting. + +**Why no short-term polling shim:** earlier framing was wrong. Molecule +runtime already polls the inbox via `wait_for_message` per turn; each +polled message fires a fresh `execute()` on the adapter, which +proxies to hermes's stateless `/v1/chat/completions`. Adding adapter- +side polling would be duplicate work. The genuine short-term gap is +**session continuity** (hermes daemon doesn't see a single +conversation across turns because chat/completions is stateless), not +push latency. That gap is solved by the upstream PR; no +intermediate shim earns its complexity. + +**Remaining (task #83):** +1. Reach out in Nous Research Discord to validate open questions + (Platform enum-vs-string refactor, naming, example-plugin scope). +2. Submit PR to `NousResearch/hermes-agent`. **Requires user + confirmation** — opening an upstream PR is an action visible to + others. +3. Once merged: ship `hermes-platform-molecule-a2a` as the first + external consumer, bump our hermes workspace template to enable + it, remove any transitional code. + +--- + +## Codex (OpenAI Codex CLI) + +**Status:** Template structurally complete (12 files, 12/12 tests passing, +validated against real codex-cli 0.72.0). Awaiting molecule-core +registry integration + E2E. + +**Path:** Persistent `codex app-server` stdio JSON-RPC client +(NDJSON-framed, v2 protocol). One app-server child per workspace +session; one `thread/start` per session; each A2A message becomes a +`turn/start` RPC; agent responses arrive as +`agent_message_delta` notifications. Per-thread serialization for +mid-turn arrivals (matches OpenClaw's per-chat sequentializer). +Optional `turn/interrupt` for "latest message wins" workspaces. + +**Artifacts landed:** +- `docs/integrations/codex-app-server-adapter-design.md` — full design + including RPC sequence, executor skeleton, eight open questions. +- `molecule-ai-workspace-template-codex/` — full template repo + scaffolded: + - `app_server.py` (286 LOC) — async JSON-RPC over NDJSON stdio + - `executor.py` (~270 LOC) — thread bootstrap, turn dispatch, + notification accumulation, mid-turn serialization + - `adapter.py` — thin `BaseAdapter` shell + preflight + - `Dockerfile`, `start.sh`, `config.yaml`, `requirements.txt`, + `README.md` + - `tests/` — **12/12 tests pass** (7 vs NDJSON mock child, 5 vs + fake AppServerProcess covering executor logic) + +**Validated against live `codex-cli 0.72.0`:** NDJSON framing, +`initialize` handshake, AND `thread/start` all work end-to-end. +**Schema-runtime drift caught:** real binary returns `thread.id`, +not `thread.threadId` as the JSON schema claims. Executor now +accepts both shapes; without the smoke test this would have been +a production bug. + +**Remaining (task #85, #86):** +1. Register `codex` in molecule-core's `manifest.json` + + `workspace-server/internal/handlers/runtime_registry.go`. + **Defer to post-demo** — touches working live registry. +2. E2E verification with a real Molecule workspace + peer A2A + traffic, per `feedback_close_on_user_visible_not_merge`. + +--- + +## Cross-cutting (task #86) + +End-to-end verification per `feedback_close_on_user_visible_not_merge`. +For each runtime, the closure criterion is not "code merged" but +"observed: real workspace boots → A2A message from peer agent → +delivered to running session → reply returned through A2A response +queue → peer agent receives". No runtime stream closes until that +chain is observed. + +--- + +## What's blocking what + +| Stream | Blocked on | +|---|---| +| claude-code | (done) | +| OpenClaw plugin | live gateway validation, then post-demo adapter rewrite | +| OpenClaw adapter rewrite | post-demo timing | +| hermes upstream PR | user confirmation to submit + Discord pre-validation | +| hermes consumer plugin | upstream PR merging | +| codex implementation | resolve 8 open questions, then post-demo eng time | +| E2E verification | each runtime stream completing | + +Three of four runtime streams are at decision points needing user +input. Pre-demo (T-4d to 2026-05-06), the safe move is to land the +remaining design + scaffolding work and defer all behavioral changes to +post-demo. diff --git a/manifest.json b/manifest.json index 72f37404..c75cdf27 100644 --- a/manifest.json +++ b/manifest.json @@ -32,7 +32,8 @@ {"name": "deepagents", "repo": "Molecule-AI/molecule-ai-workspace-template-deepagents", "ref": "main"}, {"name": "hermes", "repo": "Molecule-AI/molecule-ai-workspace-template-hermes", "ref": "main"}, {"name": "gemini-cli", "repo": "Molecule-AI/molecule-ai-workspace-template-gemini-cli", "ref": "main"}, - {"name": "openclaw", "repo": "Molecule-AI/molecule-ai-workspace-template-openclaw", "ref": "main"} + {"name": "openclaw", "repo": "Molecule-AI/molecule-ai-workspace-template-openclaw", "ref": "main"}, + {"name": "codex", "repo": "Molecule-AI/molecule-ai-workspace-template-codex", "ref": "main"} ], "org_templates": [ {"name": "molecule-dev", "repo": "Molecule-AI/molecule-ai-org-template-molecule-dev", "ref": "main"},