molecule-core/workspace/_sanitize_a2a.py
Molecule AI Fullstack Engineer 8da86ac0f7
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
sop-tier-check / tier-check (pull_request) Failing after 6s
audit-force-merge / audit (pull_request) Has been skipped
fix(workspace): sanitize trust-boundary markers in read_delegation_results (closes #361)
OFFSEC-003 follow-up: a malicious peer could inject fake [A2A_ERROR],
[SYSTEM], [A2A_QUEUED] or [IGNORE] blocks via response_preview/summary
in delegation results. The agent's prompt sees these unescaped markers and
may interpret them as system commands.

Add _sanitize_a2a.py — a dedicated sanitizer that inserts ZERO-WIDTH SPACE
(U+200B) INSIDE the opening bracket of each known marker (e.g.
"[A2A_ERROR]" → "[​A2A_ERROR]"). The raw marker string no longer
matches naive substring/regex checks while the text remains human-readable.

Apply sanitize_a2a_result() to both summary and response_preview fields
in read_delegation_results() before formatting them for prompt injection.

Tests: 18 new cases in test_sanitize_a2a.py covering marker escaping,
idempotency, injection scenarios, and edge cases. 1 integration case in
test_executor_helpers.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 04:21:42 +00:00

82 lines
3.2 KiB
Python
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""A2A trust-boundary sanitizer — escapes markers in peer-supplied text.
Issue #346 / OFFSEC-003.
Peer agents can return text that contains trust-boundary markers our own code
uses (e.g. [A2A_ERROR], [A2A_QUEUED]). If this text reaches the agent's prompt
context, a malicious peer could inject fake error/control blocks to manipulate
the agent's behavior.
This module provides `sanitize_a2a_result` which inserts a ZERO-WIDTH SPACE
(U+200B) between the opening `[` and the marker text, breaking regex/string
pattern matches while being invisible to humans reading the content.
The ZERO-WIDTH SPACE is used because:
1. It is invisible in all common fonts and terminals
2. It is a valid Unicode character (Category Cf: Format)
3. It does not affect LLM tokenization meaningfully
4. The agent cannot easily "fix" it back because it can't see it
"""
from __future__ import annotations
import re
from typing import TYPE_CHECKING
if TYPE_CHECKING:
pass
# Zero-width space — the "escape" character inserted inside the bracket.
ZWSP = ""
# Known trust-boundary markers that appear in square-bracket form.
# These are the ones our own code generates and the ones a malicious peer
# might try to inject. Each entry: (regex, replacement_template).
# The replacement puts ZWSP INSIDE the opening bracket so that "[A2A_ERROR]"
# becomes "[A2A_ERROR]" — the raw marker string no longer appears as a
# contiguous substring, but the text remains human-readable.
_TRUST_MARKER_PATTERNS: list[tuple[re.Pattern[str], str]] = [
# Our own sentinels (from a2a_client.py)
(re.compile(r"\[(A2A_ERROR)\]", re.IGNORECASE), "[\\1]"),
(re.compile(r"\[(A2A_QUEUED)\]", re.IGNORECASE), "[\\1]"),
# System-level markers (open-bracket form — captures content after "[")
(re.compile(r"\[(SYSTEM)\b"), "[\\1"),
(re.compile(r"\[(SYSTEM)\]", re.IGNORECASE), "[\\1]"),
(re.compile(r"\[(AGENT)\b"), "[\\1"),
# Generic control markers a peer might inject
(re.compile(r"\[(ADMIN)\b"), "[\\1"),
(re.compile(r"\[(BYPASS)\b"), "[\\1"),
(re.compile(r"\[(IGNORE)\b"), "[\\1"),
]
def sanitize_a2a_result(text: str) -> str:
"""Escape trust-boundary markers in peer-supplied A2A response text.
Inserts a ZERO-WIDTH SPACE (U+200B) INSIDE the opening bracket of each
known marker (e.g. ``[A2A_ERROR]`` → ``[A2A_ERROR]``), so that the raw
marker string no longer appears as a contiguous substring and naive pattern
checks do not fire on peer-supplied content.
Idempotent — running sanitized text through this function again is a no-op
because the ZWSP is already inside the brackets.
Args:
text: Raw peer-supplied text from ``response_preview`` or ``summary``
fields in delegation results.
Returns:
The input text with ZWSP escape characters inserted inside each
opening ``[`` that starts a known trust-boundary marker.
"""
if not text:
return text
result = text
for pattern, replacement in _TRUST_MARKER_PATTERNS:
# Use regex backreference to preserve the captured marker text,
# with ZWSP inserted after the opening "[".
result = pattern.sub(replacement, result)
return result