Merge pull request #2512 from Molecule-AI/feat/register-codex-runtime

feat: register codex runtime + runtime native-MCP design docs
2026-05-02 02:26:56 -07:00 · 2026-05-02 02:26:56 -07:00 · 35cb6ba089
commit 35cb6ba089
parent 2447c3da11 7224276de0
4 changed files with 715 additions and 1 deletions
--- a/docs/integrations/codex-app-server-adapter-design.md
+++ b/docs/integrations/codex-app-server-adapter-design.md
@ -0,0 +1,360 @@
 # Codex CLI workspace adapter — app-server design
 **Status:** Design draft — pre-implementation
 **Owner:** Molecule AI (hongmingwang@moleculesai.app)
 **Date:** 2026-05-02
 **Codex version validated against:** `codex-cli 0.72.0`
 **Related:** `docs/integrations/hermes-platform-plugins-upstream-pr.md`,
 `molecule-ai-workspace-template-openclaw/packages/openclaw-channel-plugin/`
 ---
 ## Goal
 Add a Molecule workspace template for the OpenAI Codex CLI runtime
 (`@openai/codex` v0.72+). The template should give Codex agents the
 same A2A inbox + mid-session push behavior the other supported
 runtimes have:
 - **claude-code:** MCP `notifications/claude/channel`
 - **OpenClaw:** channel-plugin webhook into the gateway kernel
 - **hermes:** `BasePlatformAdapter` (pending upstream PR; polling fallback today)
 - **codex (this design):** persistent `codex app-server` stdio JSON-RPC
  client; A2A messages become `turn/start` calls against a long-lived
  thread
 Today there is no codex template. The legacy fallback registry entry
 at `workspace-server/internal/handlers/runtime_registry.go:83` exists
 only to keep old workspaces from crashing — there is no live adapter,
 no Dockerfile, nothing in `manifest.json`. This design covers the
 fresh build.
 ---
 ## Architecture decision: app-server, not `codex exec`
 `codex exec --json` is the obvious shape — one CLI subprocess per
 A2A message, same anti-pattern OpenClaw used to have and that we are
 replacing. It loses session continuity (no shared thread), pays
 process-spawn cost on every turn, and gives no path to mid-turn
 interruption.
 `codex app-server` is a long-running JSON-RPC server over stdio that
 holds thread state in memory. The v2 protocol (validated below) gives
 us:
 - `thread/start` → returns `threadId`
 - `turn/start` → input array, threadId required → returns `turnId`
 - `turn/interrupt` → cancel a running turn by `(threadId, turnId)`
 - Server-pushed notifications: `agent_message_delta`, `turn/started`,
  `turn/completed`, `reasoning_text_delta`,
  `command_execution_output_delta`, `mcp_tool_call_progress`,
  `error_notification`, etc.
 A persistent app-server child plus a small async stdio reader gives us
 session continuity AND mid-turn injection. Same dual-win shape we got
 from migrating OpenClaw away from `openclaw agent`.
 ### Why not v1?
 v1 of the protocol exposes `newConversation` + `sendUserMessage` /
 `sendUserTurn` (one-shot per message, no streaming notifications). v2
 introduces threads + turns + delta notifications. v2 is the
 forward-looking surface; we build against v2 from the start.
 ---
 ## RPC sequence
 ### 1. Boot
 ```
 adapter spawn ▶ codex app-server (stdio NDJSON)
         ◀ ready (process up)
 adapter ▶ {"jsonrpc":"2.0","id":1,"method":"initialize",
           "params":{"clientInfo":{"name":"molecule-runtime","version":"…"}}}
 adapter ◀ {"id":1,"result":{"userAgent":"codex_cli_rs/0.72.0 …"}}
 ```
 Validated 2026-05-02 against the installed binary — NDJSON framing,
 initialize works as shown.
 ### 2. Thread per workspace session
 ```
 adapter ▶ thread/start
            params: {model, sandboxPolicy, approvalPolicy, cwd,
                     baseInstructions, developerInstructions, …}
 adapter ◀ {result: {thread: {threadId: "th_…"}}}
 ```
 `threadId` is cached on the adapter for the workspace's lifetime. On
 adapter restart we use `thread/resume` against the persisted ID
 (written to disk under `~/.codex/sessions/` by codex itself, but we
 also keep our own pointer in workspace state for fast restore).
 ### 3. A2A message → turn/start
 For each inbound A2A message:
 ```
 adapter ▶ turn/start
            params: {threadId, input: [{type:"text", text:"…"}], …}
 adapter ◀ {result: {turn: {turnId: "tu_…"}}}
 (server pushes notifications)
 adapter ◀ turn/started
 adapter ◀ agent_message_delta (text chunk)
 adapter ◀ agent_message_delta (text chunk)
 …
 adapter ◀ turn/completed
 ```
 The adapter accumulates `agent_message_delta` chunks into a buffer
 keyed by `turnId`, emits them onto the A2A response queue (streamed if
 the molecule-runtime contract supports streaming, otherwise assembled
 into a single final message on `turn/completed`).
 ### 4. Mid-turn injection — the load-bearing case
 **Default policy: per-thread serialization.** If a turn is already
 running when a second A2A message arrives, queue the new message and
 fire `turn/start` once the current `turn/completed` lands. This
 matches OpenClaw's per-chat sequentializer behavior — the A2A peer
 sees their messages handled in order, and we don't need
 `turn/interrupt` for the common case.
 **Opt-in policy: interrupt-and-rerun.** For workspaces that prefer
 "latest message wins" semantics (rare; configurable), the adapter
 fires `turn/interrupt` with `(threadId, currentTurnId)`, waits for
 `turn/completed` (with cancelled status), then `turn/start` with the
 combined context: previous user message + agent's partial response so
 far + new message, so the agent has full context of what got
 interrupted. Off by default.
 ### 5. Shutdown
 ```
 adapter ▶ {"method":"shutdown"} (if v2 exposes one; otherwise SIGTERM)
 adapter ▶ close stdio
 adapter ▶ wait(child, timeout=5s); on timeout SIGKILL
 ```
 ---
 ## File layout (new template repo)
 ```
 molecule-ai-workspace-template-codex/
 ├── adapter.py        # BaseAdapter shell, thin (~50 LOC)
 ├── executor.py       # AppServerProxyExecutor — the RPC client (~300 LOC)
 ├── app_server.py     # AppServerProcess — stdio child + NDJSON reader (~150 LOC)
 ├── config.yaml
 ├── Dockerfile        # node:20 + npm i -g @openai/codex@0.72
 ├── start.sh          # boots adapter; codex app-server is spawned per session by executor
 ├── requirements.txt
 ├── README.md
 └── tests/
    ├── test_app_server.py     # mocks stdio; tests framing, request/notification dispatch
    └── test_executor.py       # mocks AppServerProcess; tests turn lifecycle, interrupt
 ```
 Modeled on the hermes template (which is the closest existing shape:
 adapter.py + executor.py separation; daemon proxy via local IPC). The
 extra `app_server.py` exists because the JSON-RPC client + child
 process management is non-trivial enough to warrant its own module
 with its own tests.
 ---
 ## Executor skeleton
 ```python
 # executor.py — A2A → codex app-server bridge
 class CodexAppServerExecutor(AgentExecutor):
    """Holds one app-server child + thread, dispatches A2A turns as turn/start RPCs."""
    def __init__(self, config: AdapterConfig):
        self._config = config
        self._app_server: AppServerProcess | None = None
        self._thread_id: str | None = None
        self._turn_lock = asyncio.Lock()  # serialize per-thread by default
    async def _ensure_thread(self) -> str:
        if self._app_server is None:
            self._app_server = await AppServerProcess.start()
            await self._app_server.initialize(client_info={
                "name": "molecule-runtime",
                "version": MOLECULE_RUNTIME_VERSION,
            })
        if self._thread_id is None:
            resp = await self._app_server.request("thread/start", {
                "model": self._config.model or None,
                "developerInstructions": self._config.system_prompt or None,
                # other policy fields (sandbox, approval) — Molecule defaults
            })
            self._thread_id = resp["thread"]["threadId"]
        return self._thread_id
    async def execute(self, context: RequestContext, event_queue: EventQueue) -> None:
        prompt = extract_message_text(context.message) or ""
        if not prompt.strip():
            await event_queue.enqueue_event(new_agent_text_message("(empty prompt)"))
            return
        async with self._turn_lock:  # per-thread serialization
            thread_id = await self._ensure_thread()
            # Subscribe to delta notifications BEFORE starting the turn so we
            # don't race the first agent_message_delta.
            buffer: list[str] = []
            done = asyncio.Event()
            error: Exception | None = None
            def on_notification(method: str, params: dict) -> None:
                nonlocal error
                if method == "agent_message_delta":
                    buffer.append(params.get("delta", ""))
                elif method == "turn/completed":
                    done.set()
                elif method == "error_notification":
                    error = RuntimeError(params.get("message", "unknown app-server error"))
                    done.set()
            unsub = self._app_server.subscribe(on_notification)
            try:
                resp = await self._app_server.request("turn/start", {
                    "threadId": thread_id,
                    "input": [{"type": "text", "text": prompt}],
                })
                turn_id = resp["turn"]["turnId"]
                await asyncio.wait_for(done.wait(), timeout=_TURN_TIMEOUT)
            finally:
                unsub()
            if error:
                await event_queue.enqueue_event(
                    new_agent_text_message(f"[codex error] {error}"))
                return
            await event_queue.enqueue_event(new_agent_text_message("".join(buffer)))
    async def cancel(self, context: RequestContext, event_queue: EventQueue) -> None:
        # When the molecule-runtime cancels a request, fire turn/interrupt
        # against the currently-running turn. Best-effort — racing
        # turn/completed is fine, app-server returns a noop in that case.
        if self._app_server and self._thread_id and self._current_turn_id:
            await self._app_server.request("turn/interrupt", {
                "threadId": self._thread_id,
                "turnId": self._current_turn_id,
            })
 ```
 The `AppServerProcess` class encapsulates: stdio child management,
 NDJSON line reader/writer, request-id correlation, notification
 subscriber registry, and graceful shutdown. Standard async stdio
 JSON-RPC client — nothing exotic.
 ---
 ## Open questions to resolve before implementation
 1. **MoleculeRuntime streaming contract.** Does our A2A executor
   contract support emitting incremental events (so the user sees
   partial responses as the agent streams), or do we always assemble
   on `turn/completed`? If streaming is supported, we want to forward
   each `agent_message_delta` as an A2A event for parity with hermes
   gateway streaming. (Cross-reference: hermes adapter currently
   doesn't stream either — `executor.py:122` sets `stream=False` —
   so non-streaming is the safe v1 baseline.)
 2. **Sandbox policy default.** Codex defaults to `read-only` for safety
   in CLI mode; for workspace use we need write access to the
   workspace tree. Pick a sensible default in `thread/start` —
   probably `workspace-write` scoped to the workspace cwd.
 3. **Approval policy default.** Codex's `--ask-for-approval` modes
   (`untrusted`, `on-failure`, `never`). Workspace agents need
   `never` (they can't prompt a human). Confirm this is exposed via
   `approvalPolicy` in `thread/start`.
 4. **Auth — login flow.** Codex supports `login api-key` (env
   `OPENAI_API_KEY`) and `login chatgpt` (interactive OAuth). For
   workspace use we mandate API key. Document this in the template's
   README and surface it as a required env in config.yaml.
 5. **MCP server passthrough.** Codex's own `mcp_servers` config lets
   the agent call out to MCP servers as a CLIENT. Should the workspace
   adapter automatically wire `~/.codex/config.toml` so the agent can
   reach the molecule MCP server (chat_history, recall_memory,
   delegate_task)? Almost certainly yes — but verify the env-var
   substitution pattern works in TOML.
 6. **Thread persistence across workspace restarts.** Codex stores
   sessions on disk under `~/.codex/sessions/`. The adapter should
   persist the threadId in workspace state so a restart resumes the
   thread (`thread/resume`) rather than starting fresh. This matches
   the existing molecule-runtime convention for session continuity.
 7. **Token usage / cost reporting.** v2 emits
   `ThreadTokenUsageUpdatedNotification`. Plumb this into our usage
   tracking — same path the other runtimes use.
 8. **MCP push notifications inbound.** Earlier research established
   that codex's own MCP server mode does NOT support
   `notifications/*` for push. So the path for unsolicited mid-session
   A2A messages is NOT "codex's MCP client receives notifications from
   our MCP server" — it's "molecule-runtime polls inbox via
   `wait_for_message`, and on each polled message fires `turn/start`
   on the existing thread." The "MCP native" framing here is satisfied
   not by codex receiving MCP push, but by the persistent thread +
   turn/start delivering the same UX (session continuity + queued or
   interrupted handling of new messages mid-thread).
 ---
 ## Why this design satisfies "MCP native push parity"
 User goal: every runtime delivers A2A inbox messages with the same
 quality of experience as claude-code's MCP `notifications/claude/channel`.
 claude-code path: MCP server pushes notification → claude-code SDK
 injects synthetic user turn into running session.
 Codex path: molecule-runtime polls inbox (universal poll path) →
 adapter fires `turn/start` on the existing app-server thread → codex
 processes the message in-thread with full context. The "push" happens
 at the molecule-runtime ↔ adapter boundary; the "native" part is that
 codex's own session model handles it as an in-thread turn, not as a
 fresh subprocess.
 For mid-turn arrivals: the per-thread serialization (or opt-in
 interrupt) gives us behavior equivalent to OpenClaw's per-chat
 sequentializer. Equivalent UX to claude-code's mid-session
 notification injection in practice — one is a kernel-level interrupt,
 the other is a queue-then-dispatch, but the user-visible behavior
 ("the agent processes my message after the current turn finishes") is
 identical.
 ---
 ## Sequencing
 This is post-demo work. Order:
 1. **Spec the executor lifecycle** — pin down the open questions
   above (especially #1 streaming, #5 MCP passthrough, #6 thread
   persistence) before any code lands.
 2. **Implement `AppServerProcess`** with thorough unit tests against a
   mock stdio. This is the riskiest module (concurrency around
   request-id correlation + notification dispatch); land it first
   with high coverage.
 3. **Implement `CodexAppServerExecutor`** on top.
 4. **Build the template repo skeleton** (Dockerfile, config.yaml,
   start.sh, README) once the Python side runs locally.
 5. **Add codex to `manifest.json`** and the runtime registry.
 6. **End-to-end verify** per `feedback_close_on_user_visible_not_merge`
   — boot a real workspace, send A2A messages, observe streamed
   responses + thread continuity + queued mid-turn handling.
 Estimated total: 3-5 engineering days for v1, plus E2E hardening.
--- a/docs/integrations/hermes-platform-plugins-upstream-pr.md
+++ b/docs/integrations/hermes-platform-plugins-upstream-pr.md
@ -0,0 +1,191 @@
 # Upstream PR draft: Pluggable platform adapters for hermes-agent
 **Status:** Draft — pre-submission review
 **Target repo:** `NousResearch/hermes-agent`
 **Owner:** Molecule AI (hongmingwang@moleculesai.app)
 **Date drafted:** 2026-05-02
 ---
 ## Why this draft exists
 Molecule needs to deliver A2A inbox messages to a hermes-hosted agent the same way Telegram messages reach it today — through `_handle_message`, with `set_busy_session_handler` semantics for mid-turn arrivals. Today this requires forking `gateway/run.py` because the platform adapter system is closed (`_create_adapter` is a hardcoded if/elif chain at lines 2424-2578).
 But hermes already ships a working plugin discovery system for memory backends (`plugins/memory/__init__.py`). Extending the same pattern to platforms is a small, symmetric change — not novel architecture. This draft documents the proposed upstream PR before we open it, so we can iterate locally on tone, scope, and code shape.
 ---
 ## Proposed PR title
 > Pluggable platform adapters via `plugins/platforms/` discovery
 (Mirrors the existing `plugins/memory/` shape so the title alone signals "this is the same pattern, just for the other subsystem.")
 ---
 ## PR body
 ### Problem
 Hermes ships 19 in-tree platform adapters (Telegram, Discord, WhatsApp, Slack, Signal, Mattermost, Matrix, Email, SMS, DingTalk, Feishu, WeCom variants, Weixin, BlueBubbles, QQBot, HomeAssistant, API server, Webhook). Each is wired by editing two files:
 - `gateway/config.py:48-69` — append a `Platform` enum value
 - `gateway/run.py:2424-2578` — append an `elif platform == Platform.X:` branch in `_create_adapter()`
 For platforms with broad demand (Telegram, Slack, etc.) this is fine: the maintenance load lives upstream, every user benefits. For platforms with narrow but real demand — enterprise-internal channels (Rocket.Chat, RingCentral, Zulip), agent-to-agent inbox protocols (e.g. Molecule's A2A), niche regional platforms, or experimental transports — the only path today is forking `gateway/run.py`. Forks drift, defeat the purpose of an OSS gateway, and discourage contribution back upstream.
 ### Prior art (already in hermes)
 The memory subsystem solved exactly this problem at `plugins/memory/__init__.py`:
 1. **Two-tier discovery** — bundled providers in `plugins/memory/<name>/` plus user-installed providers in `$HERMES_HOME/plugins/<name>/`. Bundled wins on name collision.
 2. **`register(ctx)` collector pattern** (`plugins/memory/__init__.py:264-305`) — a plugin's `__init__.py` exposes a `register(ctx)` function; `ctx` already supports `register_memory_provider`, `register_tool`, `register_hook`, `register_cli_command`.
 3. **`plugin.yaml` manifest** for description and metadata.
 4. **Config-driven activation** (`memory.provider: honcho` selects which provider loads).
 Adding `register_platform_adapter` to the same collector and a `plugins/platforms/` discovery directory extends this pattern symmetrically.
 ### Proposal
 **Three small changes:**
 1. **New collector method** in `plugins/memory/__init__.py:_ProviderCollector` (or a new shared `plugins/_collector.py` if maintainers prefer cleaner separation):
   ```python
   def register_platform_adapter(self, name: str, adapter_class: type, requirements_check=None):
       """Register a platform adapter loadable as plugin.
       name: unique platform identifier (matches gateway.platforms.<name> in config)
       adapter_class: subclass of BasePlatformAdapter
       requirements_check: optional callable returning bool — same shape as
                          existing check_telegram_requirements() etc.
       """
       self.platform_adapters[name] = (adapter_class, requirements_check)
   ```
 2. **New `plugins/platforms/__init__.py`** mirroring `plugins/memory/__init__.py` — `discover_platform_adapters()`, `load_platform_adapter(name)`, two-tier (bundled + `$HERMES_HOME/plugins/`) discovery.
 3. **`_create_adapter()` fallback** at `gateway/run.py:2578` — after the in-tree if/elif chain returns None, attempt plugin lookup:
   ```python
   # Existing in-tree adapters checked first (precedence preserved).
   # If no match, fall through to plugin discovery.
   from plugins.platforms import load_platform_adapter
   plugin_entry = load_platform_adapter(platform.value)
   if plugin_entry:
       adapter_class, req_check = plugin_entry
       if req_check and not req_check():
           logger.warning(f"{platform.value}: plugin requirements not met")
           return None
       return adapter_class(config)
   return None
   ```
 4. **`Platform` enum becomes open-set.** Today it's `Enum`; switch to a string-backed pattern that accepts unknown values (still validates against the union of in-tree + discovered plugins at config-load time):
   ```python
   # gateway/config.py — replace Enum with frozen dataclass + dynamic registry.
   # Keeps the in-tree values as module-level singletons for backward compat:
   # Platform.TELEGRAM still works as today.
   ```
   This is the only "shape change" in the PR. Backward compat is straightforward: every existing `Platform.TELEGRAM` reference continues to work because the module exports the same names.
 ### Backward compatibility
 - All 19 in-tree adapters keep their hardcoded path in `_create_adapter()` (precedence: in-tree wins on name collision, exactly like memory plugins).
 - Existing config files (`gateway.platforms.telegram.enabled: true`) continue to work unchanged.
 - No new mandatory config keys.
 - Plugin discovery only runs if the platform name doesn't match an in-tree value, so cold-start cost is zero for users who don't use plugins.
 - Fork-then-add-platform users can migrate to plugins at their own pace; the in-tree path isn't deprecated.
 ### Test plan
 - **Unit**: discovery scans both bundled and user dirs, respects precedence.
 - **Unit**: `_create_adapter()` falls through to plugin lookup only when in-tree doesn't match.
 - **Integration**: ship a minimal `plugins/platforms/example/` in-tree (read-only, returns canned messages) so CI exercises the full plugin code path. Same approach `plugins/memory/holographic/` takes today.
 - **Manual**: Molecule will publish `hermes-platform-molecule-a2a` as the first external consumer once this lands.
 ### Documentation
 - Extend `CONTRIBUTING.md`'s "Should it be a Skill or a Tool?" section with "Should it be a Platform Plugin or an in-tree Platform?" — same shape, same decision tree.
 - Add `plugins/platforms/README.md` mirroring `plugins/memory/`'s convention.
 ### Out of scope (intentionally)
 - **Setuptools `entry_points`** — could be added later as a third discovery tier (after bundled + `$HERMES_HOME/plugins/`). Skipping for v1 because the directory-based discovery already covers the demand and matches the memory pattern. Adding entry_points is a non-breaking extension.
 - **Hot-reload** — plugins discovered at gateway boot, no live re-scan. Matches memory plugins.
 - **Sandboxing** — plugins run with full hermes process privileges. Same trust model as memory plugins; documented in the new README.
 ### Reference consumer
 Molecule AI will ship `hermes-platform-molecule-a2a` as the first external consumer. Use case: deliver agent-to-agent inbox messages (from peer agents authenticated at the platform layer, not the Telegram-user level) into the same `_handle_message` dispatch Telegram uses, with `internal=True` events to bypass user-auth. Expected timeline: within 2 weeks of merge.
 ---
 ## Open questions for upstream maintainers
 Per `CONTRIBUTING.md`, the right channel for design proposals is
 **GitHub Discussions**, not Discord (Discord is for "questions,
 showcasing projects, and sharing skills" — Discussions is the
 documented channel for "design proposals and architecture discussions").
 Open a Discussion at `NousResearch/hermes-agent/discussions` titled
 "RFC: pluggable platform adapters via `plugins/platforms/`" with the
 problem + proposal + open questions before filing the PR. This gives
 maintainers space to weigh in on shape before code is in flight.
 Open questions to put in the Discussion:
 1. **Preferred naming.** `register_platform_adapter` vs `register_platform` vs `register_channel`. Consistency with memory's `register_memory_provider` argues for the long form.
 2. **Enum vs string.** Is the maintainer team open to making `Platform` open-set? If not, fallback design: keep enum, add a single `Platform.PLUGIN` sentinel + a `plugin_name` field on `PlatformConfig`. Slightly uglier but smaller blast radius.
 3. **Testing**: `plugins/platforms/example/` checked into the repo, or test-fixtures-only? Memory plugins are real (mem0, honcho, supermemory bundled), so a real example seems consistent.
 4. **Discovery ordering**: confirm the user wants bundled-wins precedence (matches memory) vs user-can-override-bundled (would let downstream patch a buggy in-tree adapter without forking). Current memory pattern is bundled-wins; we'll match it unless told otherwise.
 ---
 ## Effort estimate
 - **Code change**: ~150 LOC across `plugins/platforms/__init__.py` (new), `gateway/config.py` (Platform refactor), `gateway/run.py` (10-line fallback in `_create_adapter`), tests (~50 LOC).
 - **Docs**: ~80 LOC across `CONTRIBUTING.md` extension and new `plugins/platforms/README.md`.
 - **Review cycle**: depends on maintainer responsiveness. Memory plugin system shipped in v0.5–0.7 era; platform plugin system would land for v0.11 if accepted.
 ---
 ## After this PR lands (Molecule-side follow-up)
 1. Publish `hermes-platform-molecule-a2a` (PyPI + `~/.hermes/plugins/molecule-a2a/`).
 2. Bump our hermes workspace template to declare `plugins.platforms.molecule_a2a.enabled: true`.
 3. Remove the polling shim from `molecule-ai-workspace-template-hermes/adapter.py` once the plugin path is verified end-to-end.
 ---
 ## Status checklist (for our own tracking)
 Per user's gating: "if the plugin works locally in our docker setup
 and e2e testing works, yes [submit]". Validation prerequisites:
 - [ ] Build a working `plugins/platforms/molecule_a2a/` plugin against
      a forked hermes-agent with the proposed change applied
 - [ ] Bake the forked hermes + plugin into a local copy of our
      `molecule-ai-workspace-template-hermes` Docker image
 - [ ] E2E: boot the local image, send A2A messages from a peer agent,
      observe `_handle_message` dispatch + reply through A2A queue
 - [ ] Confirm `Platform` enum refactor doesn't break downstream — grep
      for `Platform.X` usages across hermes
 - [ ] Confirm `$HERMES_HOME` is the right user-plugin root for
      platforms (matches memory convention)
 - [ ] Open a GitHub Discussion at
      `NousResearch/hermes-agent/discussions` titled
      "RFC: pluggable platform adapters via plugins/platforms/" with
      design + open questions; wait for maintainer feedback
 - [ ] Branch name: `feat/pluggable-platform-adapters` per
      CONTRIBUTING.md branch convention
 - [ ] Commit prefix: `feat(gateway): pluggable platform adapters
      via plugins/platforms/` per Conventional Commits + scope `gateway`
 - [ ] PR description covers what/why + how-to-test + platforms tested,
      per CONTRIBUTING.md PR-description requirements
 - [ ] Open PR against `NousResearch/hermes-agent` main once Discussion
      lands consensus
 - [ ] Track PR; bump cadence weekly; if stalled past 4 weeks, propose
      fork-and-bundle as fallback for our hermes template image
--- a/docs/integrations/runtime-native-mcp-status.md
+++ b/docs/integrations/runtime-native-mcp-status.md
@ -0,0 +1,162 @@
 # Runtime native-MCP push parity — status
 **Goal:** every workspace runtime delivers Molecule A2A inbox messages
 with the same UX as claude-code's MCP `notifications/claude/channel`
 push: session continuity + queued or interrupted handling of new
 messages mid-thread, no fresh subprocess per message.
 Tracked across four runtime streams. Updated 2026-05-02.
 ---
 ## claude-code
 **Status:** ✅ Done. Native MCP `notifications/claude/channel` push
 shipped via `workspace/a2a_mcp_server.py`. Requires the host to launch
 with `--dangerously-load-development-channels server:molecule`.
 No further work.
 ---
 ## OpenClaw
 **Status:** Scaffolded; awaiting validation + companion adapter rewrite.
 **Path:** Channel-plugin SDK (`openclaw/plugin-sdk`), auto-discovered
 from `~/.openclaw/plugins/<name>/` or workspace `.openclaw/`. Plugin
 registers an HTTP webhook on `openclaw gateway`; Molecule workspace
 adapter POSTs A2A messages to it; gateway dispatches through the same
 `dispatchReplyWithBufferedBlockDispatcher` kernel call native channels
 (Telegram, Lark, Slack, Discord) use.
 **Artifacts landed:**
 - `molecule-ai-workspace-template-openclaw/packages/openclaw-channel-plugin/`
  - `package.json`, `openclaw.plugin.json` (manifest), `index.ts`
    (channel + webhook handler), `README.md`, `tsconfig.json`
 - Pre-release `v0.1.0-pre`. Mirrors `rabbit-lark-bot` reference
  plugin shape.
 **Remaining (task #84, #87):**
 1. Validate against a running OpenClaw gateway. Open questions in the
   plugin README: `resolveAgentRoute` peer-id shape,
   `dispatchReplyWithBufferedBlockDispatcher` async semantics,
   `outbound.sendText` no-op safety.
 2. Rewrite Python adapter (`adapter.py`) to stop shelling out
   `openclaw agent --message ...` and instead POST to the plugin's
   webhook + run `/agent-reply` callback HTTP server. **Post-demo
   work** (touches a working integration).
 ---
 ## hermes
 **Status:** Upstream PR drafted; short-term shim deemed unnecessary.
 **Path:** Open the upstream `BasePlatformAdapter` system to external
 plugins. Hermes already ships a working plugin discovery system for
 memory backends (`plugins/memory/`, `register(ctx)` collector pattern,
 `$HERMES_HOME/plugins/<name>/` user-installed tier). The PR extends
 the same shape to platforms — `register_platform_adapter(...)` on the
 existing collector, new `plugins/platforms/` discovery directory,
 3-line fallback in `_create_adapter()`. Symmetric, not novel.
 **Artifacts landed:**
 - `docs/integrations/hermes-platform-plugins-upstream-pr.md` — full
  PR draft including problem, prior art, proposal, code shape,
  backward compat, test plan, and four open questions to resolve in
  Discord before submitting.
 **Why no short-term polling shim:** earlier framing was wrong. Molecule
 runtime already polls the inbox via `wait_for_message` per turn; each
 polled message fires a fresh `execute()` on the adapter, which
 proxies to hermes's stateless `/v1/chat/completions`. Adding adapter-
 side polling would be duplicate work. The genuine short-term gap is
 **session continuity** (hermes daemon doesn't see a single
 conversation across turns because chat/completions is stateless), not
 push latency. That gap is solved by the upstream PR; no
 intermediate shim earns its complexity.
 **Remaining (task #83):**
 1. Reach out in Nous Research Discord to validate open questions
   (Platform enum-vs-string refactor, naming, example-plugin scope).
 2. Submit PR to `NousResearch/hermes-agent`. **Requires user
   confirmation** — opening an upstream PR is an action visible to
   others.
 3. Once merged: ship `hermes-platform-molecule-a2a` as the first
   external consumer, bump our hermes workspace template to enable
   it, remove any transitional code.
 ---
 ## Codex (OpenAI Codex CLI)
 **Status:** Template structurally complete (12 files, 12/12 tests passing,
 validated against real codex-cli 0.72.0). Awaiting molecule-core
 registry integration + E2E.
 **Path:** Persistent `codex app-server` stdio JSON-RPC client
 (NDJSON-framed, v2 protocol). One app-server child per workspace
 session; one `thread/start` per session; each A2A message becomes a
 `turn/start` RPC; agent responses arrive as
 `agent_message_delta` notifications. Per-thread serialization for
 mid-turn arrivals (matches OpenClaw's per-chat sequentializer).
 Optional `turn/interrupt` for "latest message wins" workspaces.
 **Artifacts landed:**
 - `docs/integrations/codex-app-server-adapter-design.md` — full design
  including RPC sequence, executor skeleton, eight open questions.
 - `molecule-ai-workspace-template-codex/` — full template repo
  scaffolded:
  - `app_server.py` (286 LOC) — async JSON-RPC over NDJSON stdio
  - `executor.py` (~270 LOC) — thread bootstrap, turn dispatch,
    notification accumulation, mid-turn serialization
  - `adapter.py` — thin `BaseAdapter` shell + preflight
  - `Dockerfile`, `start.sh`, `config.yaml`, `requirements.txt`,
    `README.md`
  - `tests/` — **12/12 tests pass** (7 vs NDJSON mock child, 5 vs
    fake AppServerProcess covering executor logic)
 **Validated against live `codex-cli 0.72.0`:** NDJSON framing,
 `initialize` handshake, AND `thread/start` all work end-to-end.
 **Schema-runtime drift caught:** real binary returns `thread.id`,
 not `thread.threadId` as the JSON schema claims. Executor now
 accepts both shapes; without the smoke test this would have been
 a production bug.
 **Remaining (task #85, #86):**
 1. Register `codex` in molecule-core's `manifest.json` +
   `workspace-server/internal/handlers/runtime_registry.go`.
   **Defer to post-demo** — touches working live registry.
 2. E2E verification with a real Molecule workspace + peer A2A
   traffic, per `feedback_close_on_user_visible_not_merge`.
 ---
 ## Cross-cutting (task #86)
 End-to-end verification per `feedback_close_on_user_visible_not_merge`.
 For each runtime, the closure criterion is not "code merged" but
 "observed: real workspace boots → A2A message from peer agent →
 delivered to running session → reply returned through A2A response
 queue → peer agent receives". No runtime stream closes until that
 chain is observed.
 ---
 ## What's blocking what
 | Stream | Blocked on |
 |---|---|
 | claude-code | (done) |
 | OpenClaw plugin | live gateway validation, then post-demo adapter rewrite |
 | OpenClaw adapter rewrite | post-demo timing |
 | hermes upstream PR | user confirmation to submit + Discord pre-validation |
 | hermes consumer plugin | upstream PR merging |
 | codex implementation | resolve 8 open questions, then post-demo eng time |
 | E2E verification | each runtime stream completing |
 Three of four runtime streams are at decision points needing user
 input. Pre-demo (T-4d to 2026-05-06), the safe move is to land the
 remaining design + scaffolding work and defer all behavioral changes to
 post-demo.
--- a/manifest.json
+++ b/manifest.json
@ -32,7 +32,8 @@
    {"name": "deepagents", "repo": "Molecule-AI/molecule-ai-workspace-template-deepagents", "ref": "main"},
    {"name": "hermes", "repo": "Molecule-AI/molecule-ai-workspace-template-hermes", "ref": "main"},
    {"name": "gemini-cli", "repo": "Molecule-AI/molecule-ai-workspace-template-gemini-cli", "ref": "main"},
-    {"name": "openclaw", "repo": "Molecule-AI/molecule-ai-workspace-template-openclaw", "ref": "main"}
+    {"name": "openclaw", "repo": "Molecule-AI/molecule-ai-workspace-template-openclaw", "ref": "main"},
    {"name": "codex", "repo": "Molecule-AI/molecule-ai-workspace-template-codex", "ref": "main"}
  ],
  "org_templates": [
    {"name": "molecule-dev", "repo": "Molecule-AI/molecule-ai-org-template-molecule-dev", "ref": "main"},