forked from molecule-ai/molecule-core
Today, if `adapter.setup()` raises (most often: an LLM credential is
missing/rotated), main.py crashes before the agent-card route is mounted.
start.sh restart-loops, /.well-known/agent-card.json never returns 200,
and the workspace is invisible to the bench/canvas — operators see
"stuck booting forever" with no clear error to act on.
The agent-card is a static capability advertisement (name, version,
skills, supported protocols). It doesn't need a working LLM. Coupling
its mount to setup() conflates *availability* ("am I up?") with
*configuration* ("can I actually answer?"). They're different concerns.
This change:
- Builds AgentCard from `config.skills` (static names from config.yaml)
BEFORE adapter.setup(), so the route mounts independent of setup state.
- Wraps setup() + create_executor in try/except. On success, mounts
the real DefaultRequestHandler with rich loaded_skills metadata
swapped into the card in-place. On failure, mounts a JSON-RPC
handler that returns -32603 "agent not configured" with the
setup() exception in error.data.
- Heartbeat keeps running on misconfigured boots so the platform
marks the workspace as reachable-but-misconfigured rather than
crash-looping. Operators redeploy with corrected env without
chasing a restart loop.
- initial_prompt and idle_loop are skipped on misconfigured boots —
they self-fire to /, which would land in -32603 anyway, and the
marker would consume on the first useless attempt.
Bench impact (RFC #388 strict <120s): codex/openclaw bench-time-outs
were the agent-card-never-returns-200 symptom. With this fix those
runtimes serve the card immediately on EC2 boot, so the bench
measures infrastructure cold-start (claude-code class: ~50–80s)
instead of credential-coupled boot.
Adds workspace/not_configured_handler.py (factory + module-level so
behavior is unit-testable; main.py is `# pragma: no cover`) and
workspace/tests/test_not_configured_handler.py (6 tests covering
status code, JSON-RPC envelope shape, id-echo, malformed-body
fallback, reason surfacing, batch-body safety).
All 1665 existing workspace tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
56 lines
2.0 KiB
Python
56 lines
2.0 KiB
Python
"""Build a JSON-RPC handler that returns ``-32603 "agent not configured"``.
|
|
|
|
Used by the workspace runtime when ``adapter.setup()`` fails (most often
|
|
because an LLM credential is missing or rotated). Lets ``/.well-known/agent-card.json``
|
|
keep serving 200 — the workspace stays REACHABLE for canvas/operator
|
|
introspection — while message-send requests get a clear, immediate
|
|
error instead of silently timing out.
|
|
|
|
Kept as its own module so the behavior is unit-testable without booting
|
|
the whole runtime (main.py is ``# pragma: no cover``).
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
from typing import Awaitable, Callable
|
|
|
|
from starlette.requests import Request
|
|
from starlette.responses import JSONResponse
|
|
|
|
|
|
def make_not_configured_handler(
|
|
reason: str | None,
|
|
) -> Callable[[Request], Awaitable[JSONResponse]]:
|
|
"""Return a Starlette POST handler that always 503s with JSON-RPC -32603.
|
|
|
|
``reason`` is surfaced in the JSON-RPC ``error.data`` field so canvas
|
|
can render "agent not configured: <reason>" to the user. Pass the
|
|
stringified ``adapter.setup()`` exception. ``None`` falls back to a
|
|
generic "adapter.setup() failed".
|
|
|
|
The handler echoes the request's JSON-RPC ``id`` when present so a
|
|
well-behaved JSON-RPC client can correlate the error to its request.
|
|
Malformed bodies (non-JSON, missing id) get ``id: null`` per spec.
|
|
"""
|
|
|
|
fallback = reason or "adapter.setup() failed"
|
|
|
|
async def _handler(request: Request) -> JSONResponse:
|
|
try:
|
|
body = await request.json()
|
|
except Exception: # noqa: BLE001
|
|
body = {}
|
|
return JSONResponse(
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"id": body.get("id") if isinstance(body, dict) else None,
|
|
"error": {
|
|
"code": -32603,
|
|
"message": "Internal error: agent not configured",
|
|
"data": fallback,
|
|
},
|
|
},
|
|
status_code=503,
|
|
)
|
|
|
|
return _handler
|