Three required fixes from the bundle review of 391e1872:
1. workspace/a2a_client.py: substring `type_name in msg` could miss
the diagnostic prefix when an exception's message embedded a
different class name mid-string (e.g. `OSError("see ConnectionError
below")` → printed as plain msg, type lost). Switched to a
prefix-anchored check (`msg.startswith(f"{type_name}:")` etc.) so
the type label is always added when not already at the start of
the message.
2. workspace/a2a_tools.py: `activity_logs.error_detail` is unbounded
TEXT on the platform (handlers/activity.go does not validate
length). A buggy or hostile peer could stream arbitrarily large
error messages into the caller's activity log. Cap at 4096 chars
at the producer — comfortably above any real exception traceback,
well below an obvious-DoS threshold.
3. New regression test for JSON-RPC `code=0` — pins the
`code is not None` semantics so the code is preserved in the
detail rather than collapsing into the no-code path. Code=0 is
not valid per the spec, but a malformed peer can still emit it
and we want it visible for diagnosis.
Plus one optional taken: extracted the A2A-error → hint mapping into
canvas/src/components/tabs/chat/a2aErrorHint.ts. The two prior copies
(AgentCommsPanel.inferCauseHint + ActivityTab.inferA2AErrorHint) had
already drifted — Activity tab gained `not found`/`offline` cases the
chat panel never picked up, AgentCommsPanel handled empty-input
explicitly while Activity didn't. The shared module is the merged
superset, with 10 unit tests pinning each named pattern + the
"most specific first" ordering (Claude SDK wedge wins over generic
timeout).
Skipped (per analysis):
- Unicode-naive 120-char slice — Python str[:N] slices on code
points, not bytes. Safe.
- Nested [A2A_ERROR] confusion — non-issue per reviewer; outer
prefix winning still produces a structured render.
- MessagePreview + JsonBlock dual render on errors — intentional
drilldown; raw JSON is below the fold for operators who need it.
- console.warn dedup — refetches don't happen per-event so spam
risk is low.
- str(data)[:200] materialization — A2A response bodies aren't
typically MB-sized.
Verified: 1005 canvas tests pass (10 new hint tests); 10 Python
send_a2a_message tests pass (1 new for code=0); tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: Activity tab and Agent Comms surfaced bare "[A2A_ERROR] "
(prefix + nothing) for failed delegations. Operator had no signal
to act on — no exception type, no target, no hint about what went
wrong, no next step. Fix is in three layers.
1. workspace/a2a_client.py — every error path now produces an
actionable detail string:
- except branch: some httpx exceptions (RemoteProtocolError,
ConnectionReset variants) stringify to "". Pre-fix the catch
was `f"{_A2A_ERROR_PREFIX}{e}"` → bare prefix. Now falls back
to `<TypeName> (no message — likely connection reset or silent
timeout)` and always appends `[target=<url>]` for traceability
in chained delegations.
- JSON-RPC error branch: previously dropped error.code on the
floor and printed "unknown" when message was missing. Now
surfaces both, including the well-defined "JSON-RPC error
with no message (code=N)" path.
- "neither result nor error" branch: pre-fix returned
str(payload) which the canvas rendered as a successful
response block. Now tagged as A2A_ERROR with a payload
snippet so downstream UI routes through the error path.
2. workspace/a2a_tools.py — tool_delegate_task now passes
error_detail (the stripped error message) through to the
activity-log POST. The platform's activity_logs.error_detail
column is the canvas's red error chip source; populating it
makes the failure visible in the row header without the user
having to expand into raw response_body JSON. The summary line
also gets a 120-char prefix of the cause so the collapsed row
reads "React Engineer failed: ConnectionResetError: ... [target=...]"
instead of "React Engineer failed".
3. canvas/src/components/tabs/ActivityTab.tsx — MessagePreview
now detects [A2A_ERROR]-prefixed bodies and renders a
structured error block (red chip, stripped detail, cause hint)
instead of the previous gray text-block that showed the literal
"[A2A_ERROR]" string. inferA2AErrorHint mirrors the patterns
from AgentCommsPanel.inferCauseHint so the same symptom reads
the same way in both surfaces (Claude SDK init wedge → restart
workspace; timeout → busy/stuck; connection-reset → transient
blip then check logs).
Tests: 9 send_a2a_message tests pass (including a new regression
test for the empty-stringifying-exception case that the user
reported); 995 canvas tests pass; tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workspace runtime fired four classes of A2A request to the platform
without the X-Workspace-ID header that identifies the source
workspace: heartbeat self-messages, initial_prompt, idle-loop fires,
and peer-to-peer A2A from runtime tools. The platform's a2a_receive
logger keys source_id off that header — without it, every such row
was written with source_id=NULL, which the canvas's My Chat tab
filters as ?source=canvas (i.e. "user typed this") and rendered the
internal triggers as if the human user had sent them. The
"Delegation results are ready..." heartbeat trigger was visible to
end users in the chat history; delegate_task A2A calls between agents
were misclassified the same way.
Centralise the header construction in a new platform_auth helper
self_source_headers(workspace_id) that returns auth_headers() PLUS
{X-Workspace-ID: <id>}. Apply it to:
- heartbeat.py self-message (refactored from inline header dict)
- main.py initial_prompt POST
- main.py idle_prompt POST
- a2a_client.py send_a2a_message (peer A2A from runtime)
- builtin_tools/a2a_tools.py delegate_task (was missing ALL headers)
Tests:
- test_heartbeat.py asserts the X-Workspace-ID header is set on
the self-message POST.
- test_a2a_tools_module.py asserts the same on delegate_task POSTs;
FakeClient.post mocks updated to accept the headers kwarg.
Production effect lands the moment workspace containers are rebuilt
with this code; existing rows in activity_logs keep their NULL
source_id (legacy data). The canvas-side filter (#follow-up)
covers the historical-rows case until backfill.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- PLATFORM_URL: replace unreachable http://platform:8080 mesh-only default
with Docker-aware detection (host.docker.internal in containers,
localhost for local dev) across all workspace Python modules and the
git-token-helper shell script.
- WORKSPACE_ID: add fail-fast validation in main.py (SystemExit if empty)
consistent with coordinator.py / a2a_cli.py patterns already in place.
- INCIDENT_LOG.md: replace all 3 F1088 credential types with
***REDACTED*** (sk-cp- 2x, github_pat_ 2x, ADMIN_TOKEN base64 3x).
Fixes#1124, #1333.
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>