Send initialized notification after initialize handshake (fix codex 0.130 wedge — internal#659 P1#1) #48

Merged
hongming merged 2 commits from fix/send-initialized-notification into main 2026-05-24 09:57:57 +00:00
Owner

Summary

Send the initialized notification (JSON-RPC, no id) after the initialize handshake response, before any further request. The codex app-server protocol requires this — without it, codex-cli 0.130 silently wedges every subsequent request.

Root cause

Per codex-rs/app-server/README.md and codex-rs/app-server-protocol/schema/json/ClientNotification.json, the only client→server notification defined is initialized, and it is mandatory after the initialize response and before any further method invocation. Our app_server.py was missing this entirely — grep -E 'initialized|notify' app_server.py returned zero matches before this patch.

codex-cli 0.72.x (against which executor.py was originally verified — see line ~187 comment dated 2026-05-02) was permissive. codex-cli 0.130.0 (current runtime image pin) is strict.

Production evidence — 2026-05-24

Both codex agents in agents-team prod (CR2 4e817f43-a0b7-4c44-9fcb-d6b2e7d4dda1 and Researcher 712b5600-d397-4749-adac-cd6ee574afea) exhibited the wedge:

  • Registered with platform: 200
  • POST / HTTP/1.1 200 OK on every inbound delegation ✓
  • cron_run + a2a_receive recorded in activity_logs
  • agent_log entries produced: 0 for the entire session ✗

Pattern survived docker restart + full re-provision (POST /workspaces/:id/restart). Both restart modes preserved the symptom because neither sent initialized.

Fix

  • app_server.py: add notify(method, params=None) public method that writes a JSON-RPC notification (no id) through the existing _write_message path. Update initialize() to call await self.notify("initialized") after the response.
  • tests/mock_app_server.py: record incoming notification methods in a module list; expose them via a test-only get_received_notifications RPC.
  • tests/test_app_server.py: two new tests asserting (1) initialized is sent after initialize, (2) notify() produces a no-id JSON-RPC message.

12/12 tests pass locally including the two new ones. No changes to executor.py — the fix is fully contained in the JSON-RPC client primitive.

Test plan

  • pytest tests/test_app_server.py -x -q — 12/12 pass
  • Merge → image build → promote pin → re-provision CR2 + Researcher → verify agent_log count > 0 within one cron tick
  • If verified, mark P1#1 root cause in internal#659 as resolved

Related

🤖 Generated with Claude Code

## Summary Send the `initialized` notification (JSON-RPC, no id) after the `initialize` handshake response, before any further request. The codex app-server protocol requires this — without it, codex-cli 0.130 silently wedges every subsequent request. ## Root cause Per `codex-rs/app-server/README.md` and `codex-rs/app-server-protocol/schema/json/ClientNotification.json`, **the only client→server notification defined is `initialized`**, and it is **mandatory after the initialize response and before any further method invocation**. Our `app_server.py` was missing this entirely — `grep -E 'initialized|notify' app_server.py` returned zero matches before this patch. codex-cli 0.72.x (against which `executor.py` was originally verified — see line ~187 comment dated 2026-05-02) was permissive. codex-cli 0.130.0 (current runtime image pin) is strict. ## Production evidence — 2026-05-24 Both codex agents in agents-team prod (CR2 `4e817f43-a0b7-4c44-9fcb-d6b2e7d4dda1` and Researcher `712b5600-d397-4749-adac-cd6ee574afea`) exhibited the wedge: - `Registered with platform: 200` ✓ - `POST / HTTP/1.1 200 OK` on every inbound delegation ✓ - `cron_run` + `a2a_receive` recorded in `activity_logs` ✓ - **`agent_log` entries produced: 0** for the entire session ✗ Pattern survived `docker restart` + full re-provision (`POST /workspaces/:id/restart`). Both restart modes preserved the symptom because neither sent `initialized`. ## Fix - **`app_server.py`**: add `notify(method, params=None)` public method that writes a JSON-RPC notification (no `id`) through the existing `_write_message` path. Update `initialize()` to call `await self.notify("initialized")` after the response. - **`tests/mock_app_server.py`**: record incoming notification methods in a module list; expose them via a test-only `get_received_notifications` RPC. - **`tests/test_app_server.py`**: two new tests asserting (1) `initialized` is sent after `initialize`, (2) `notify()` produces a no-id JSON-RPC message. 12/12 tests pass locally including the two new ones. No changes to executor.py — the fix is fully contained in the JSON-RPC client primitive. ## Test plan - [x] `pytest tests/test_app_server.py -x -q` — 12/12 pass - [ ] Merge → image build → promote pin → re-provision CR2 + Researcher → verify `agent_log` count > 0 within one cron tick - [ ] If verified, mark P1#1 root cause in internal#659 as resolved ## Related - internal#659 P1#1 — root cause analysis (this PR is the smallest sufficient fix) - molecule-core#1644 Part B — companion (auth_token in 201) for the recovery path; not blocked by this - Upstream contract: https://github.com/openai/codex/blob/main/codex-rs/app-server/README.md 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming added 1 commit 2026-05-24 09:34:34 +00:00
Send initialized notification after initialize handshake
CI / Template validation (runtime) (push) Blocked by required conditions
CI / T4 tier-4 conformance (live) (push) Blocked by required conditions
CI / validate (push) Blocked by required conditions
CI / Adapter unit tests (push) Successful in 19s
CI / Template validation (static) (push) Successful in 38s
CI / Adapter unit tests (pull_request) Successful in 18s
CI / Template validation (static) (pull_request) Successful in 43s
CI / Template validation (runtime) (pull_request) Successful in 2m5s
CI / T4 tier-4 conformance (live) (pull_request) Successful in 2m3s
CI / validate (pull_request) Successful in 3s
f0a7a8ebb0
The codex app-server protocol requires clients to send an `initialized`
notification (no id, JSON-RPC) after receiving the `initialize` response
and BEFORE issuing any further request. The contract is documented in
codex-rs/app-server/README.md and codex-rs/app-server-protocol/schema/
json/ClientNotification.json (only client→server notification defined).

Without this notification, codex-cli 0.130.0 (current production image)
silently wedges every subsequent request — thread/start, turn/start —
with no agent_log output. Reproduced live 2026-05-24 against agents-team
prod CR2 (4e817f43…) and Researcher (712b5600…): both received POST /
200 OK but produced zero agent_log activity for the entire session.

codex-cli 0.72.x was permissive about a missing initialized; 0.130.0 is
strict. The executor.py comment dating verification to 2026-05-02 against
0.72.0 was effectively a regression vector when the runtime-image pin
advanced past 0.72.

Fix is two parts:
  - app_server.py: add notify() method (JSON-RPC notification, no id),
    and call notify('initialized') at end of initialize() after the
    response lands.
  - mock_app_server.py: record received notifications + expose them via
    a get_received_notifications test-only RPC.
  - test_app_server.py: two new tests asserting (1) initialized is sent
    after initialize, (2) notify() produces a no-id JSON-RPC message.

Root cause tracking: internal#659 P1#1. This is the smallest sufficient
fix; the broader codex template restart-safety work tracked separately
under that issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming added 1 commit 2026-05-24 09:39:14 +00:00
review: harden mock notification recording against task-FIFO race
CI / Template validation (static) (push) Successful in 32s
CI / Adapter unit tests (push) Successful in 15s
CI / Adapter unit tests (pull_request) Successful in 15s
CI / Template validation (static) (pull_request) Successful in 28s
CI / Template validation (runtime) (pull_request) Successful in 1m30s
CI / Template validation (runtime) (push) Successful in 1m42s
CI / T4 tier-4 conformance (live) (push) Successful in 1m39s
CI / T4 tier-4 conformance (live) (pull_request) Successful in 1m4s
CI / validate (push) Successful in 4s
CI / validate (pull_request) Successful in 13s
069a206377
AI review on #48 flagged that recording notifications inside _handle()
creates a race with a subsequent request that reads
_received_notifications: both are dispatched as separate asyncio tasks,
and ordering relies on FIFO + no-await invariants that future edits
could break unnoticed.

Fix: record notifications synchronously in main()'s read loop BEFORE
dispatching to _handle(). The handler still runs (no-op for
notifications) but the visible-state mutation now precedes any request
handler that follows on the same connection.

12/12 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sdk-lead approved these changes 2026-05-24 09:57:46 +00:00
sdk-lead left a comment
Member

APPROVE.

Protocol-compliance fix: client now sends the required initialized notification per codex-rs/app-server protocol contract (ClientNotification.json defines only this notification, README marks it mandatory after initialize response).

Root-caused live against agents-team prod CR2 + Researcher (workspaces 4e817f43..., 712b5600...) — both received POST / 200 OK but produced zero agent_log; codex-cli 0.130 strictly rejects post-initialize requests when the notification is missing. 0.72.x (against which executor.py was verified) was permissive.

Review points checked:

  • notify() implementation correct: JSON-RPC no id, reuses _write_message/_write_lock, mirrors request()s precondition checks
  • protocol fidelity: schema confirms bare method (no params); implementation conditionally omits params when None — matches
  • side effects: only call site is executor.py:139 (await initialize() then proceeds to thread/start). No subscriber side channel. Safe.
  • test adequacy: initial mock-recording had a task-FIFO race; reviewer flagged it; commit 069a206 hardened it by recording in main()s read loop before _handle dispatch.
  • 12/12 tests pass; CI green (10/10).

Resolves internal#659 P1#1.

APPROVE. Protocol-compliance fix: client now sends the required `initialized` notification per codex-rs/app-server protocol contract (ClientNotification.json defines only this notification, README marks it mandatory after initialize response). Root-caused live against agents-team prod CR2 + Researcher (workspaces 4e817f43..., 712b5600...) — both received POST / 200 OK but produced zero agent_log; codex-cli 0.130 strictly rejects post-initialize requests when the notification is missing. 0.72.x (against which executor.py was verified) was permissive. Review points checked: - notify() implementation correct: JSON-RPC no id, reuses _write_message/_write_lock, mirrors request()s precondition checks - protocol fidelity: schema confirms bare method (no params); implementation conditionally omits params when None — matches - side effects: only call site is executor.py:139 (await initialize() then proceeds to thread/start). No subscriber side channel. Safe. - test adequacy: initial mock-recording had a task-FIFO race; reviewer flagged it; commit 069a206 hardened it by recording in main()s read loop before _handle dispatch. - 12/12 tests pass; CI green (10/10). Resolves internal#659 P1#1.
hongming merged commit 993095a860 into main 2026-05-24 09:57:57 +00:00
hongming deleted branch fix/send-initialized-notification 2026-05-24 09:57:58 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-ai-workspace-template-codex#48