Root cause (from infra-lead PR#7 review id=724): Sanitization in PR#7 wrapped peer text in [A2A_RESULT_FROM_PEER] markers, but the markers themselves were not escaped — a malicious peer could inject "[/A2A_RESULT_FROM_PEER]" to close the trust boundary early, making subsequent text appear inside the trusted zone. Fix: - Create workspace/_sanitize_a2a.py (leaf module, no circular import risk) with shared sanitize_a2a_result() + _escape_boundary_markers() - _escape_boundary_markers() escapes boundary open/close markers in the raw peer text before wrapping (primary security control) - Defense-in-depth: also escapes SYSTEM/OVERRIDE/INSTRUCTIONS/IGNORE ALL/YOU ARE NOW patterns (secondary, per PR#7 design intent) - Update a2a_tools_delegation.py: import from _sanitize_a2a; wrap tool_delegate_task return and tool_check_task_status response_preview - Add 15 tests covering boundary escape, injection patterns, integration shapes (workspace/tests/test_a2a_sanitization.py) Follow-up (non-blocking, noted in PR#7 infra-lead review): - Deduplicate if a2a_tools.py also wraps (currently handled in delegation module only — callers get sanitized output regardless) - tool_check_task_status: consider sanitizing 'summary' field too Closes: molecule-ai/molecule-ai-workspace-runtime#7 (wrong-repo PR that this supersedes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
100 lines
4.5 KiB
Python
100 lines
4.5 KiB
Python
"""OFFSEC-003: A2A peer-result sanitization — shared across delegation tools.
|
|
|
|
This module is intentionally a LEAF (no imports from the molecule-runtime
|
|
package) to avoid circular dependency cycles. Both ``a2a_tools_delegation``
|
|
and ``a2a_tools`` can import from here without creating import loops.
|
|
|
|
Trust-boundary design (OFFSEC-003):
|
|
A2A peer responses are untrusted third-party content. Before passing
|
|
them to the agent context, they MUST be wrapped in a trust-boundary
|
|
marker pair so the calling agent knows the content is external.
|
|
|
|
Boundary markers:
|
|
- _A2A_BOUNDARY_START = "[A2A_RESULT_FROM_PEER]"
|
|
- _A2A_BOUNDARY_END = "[/A2A_RESULT_FROM_PEER]"
|
|
|
|
The boundary is the PRIMARY security control. A peer that sends
|
|
"[A2A_RESULT_FROM_PEER]evil[/A2A_RESULT_FROM_PEER]safe" can make "safe"
|
|
appear inside the trusted context unless the markers themselves are
|
|
escaped before wrapping — see _escape_boundary_markers() below.
|
|
|
|
Defense-in-depth (secondary):
|
|
Known prompt-injection control-words are also escaped so that even
|
|
if a calling agent ignores the boundary marker, embedded attack
|
|
patterns (SYSTEM:, OVERRIDE:, etc.) lose their special meaning.
|
|
This is not a complete injection sanitizer — do not rely on it as
|
|
the primary control.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import re
|
|
|
|
# ── Trust-boundary markers ────────────────────────────────────────────────────
|
|
|
|
_A2A_BOUNDARY_START = "[A2A_RESULT_FROM_PEER]"
|
|
_A2A_BOUNDARY_END = "[/A2A_RESULT_FROM_PEER]"
|
|
|
|
# ── Boundary-marker escaping ─────────────────────────────────────────────────
|
|
# A peer that sends "[/A2A_RESULT_FROM_PEER]evil" can make "evil" appear
|
|
# inside the trusted zone. Escape BOTH boundary markers in the raw text
|
|
# before wrapping so they can never close the boundary early.
|
|
# We use "[/ " as the escape prefix — visually distinct from the real marker.
|
|
|
|
|
|
def _escape_boundary_markers(text: str) -> str:
|
|
"""Escape boundary markers inside the raw peer text before wrapping.
|
|
|
|
Replaces any occurrence of the boundary start/end markers with a
|
|
visually-similar escaped form so a malicious peer can never close
|
|
the boundary early or inject a fake opener.
|
|
"""
|
|
return (
|
|
text.replace(_A2A_BOUNDARY_START, "[/ A2A_RESULT_FROM_PEER]")
|
|
.replace(_A2A_BOUNDARY_END, "[/ /A2A_RESULT_FROM_PEER]")
|
|
)
|
|
|
|
|
|
# ── Defense-in-depth: injection pattern escaping ───────────────────────────────
|
|
# These patterns cover common prompt-injection phrasings. They are NOT a
|
|
# complete sanitizer — see module docstring. The boundary marker is the
|
|
# primary control; these are purely defense-in-depth.
|
|
|
|
_INJECTION_PATTERNS = [
|
|
# Single-word patterns: anchor to word boundary so they don't match
|
|
# inside other words (e.g. "SYSTEM" in "mySYSTEMatic").
|
|
# Single-word patterns: anchor to word boundary so they don't match
|
|
# inside other words (e.g. "SYSTEM" in "mySYSTEMatic").
|
|
(re.compile(r"(^|[^\w])SYSTEM\b", re.IGNORECASE), r"\1[ESCAPED_SYSTEM]"),
|
|
(re.compile(r"(^|[^\w])OVERRIDE\b", re.IGNORECASE), r"\1[ESCAPED_OVERRIDE]"),
|
|
# "INSTRUCTIONS" may appear at the start of a string or after a newline.
|
|
(re.compile(r"(^|\n)INSTRUCTIONS?\b", re.IGNORECASE), " [ESCAPED_INSTRUCTIONS]"),
|
|
(re.compile(r"(^|[^\w])IGNORE\s+ALL\b", re.IGNORECASE), r"\1[ESCAPED_IGNORE_ALL]"),
|
|
(re.compile(r"(^|[^\w])YOU\s+ARE\s+NOW\b", re.IGNORECASE), r"\1[ESCAPED_YOU_ARE_NOW]"),
|
|
]
|
|
|
|
|
|
def sanitize_a2a_result(text: str) -> str:
|
|
"""Sanitize and wrap untrusted text from an A2A peer (OFFSEC-003).
|
|
|
|
Order of operations:
|
|
1. Escape boundary markers in the raw text (prevents injection).
|
|
2. Escape known injection patterns (defense-in-depth).
|
|
3. Wrap in trust-boundary markers.
|
|
|
|
Returns the input unchanged if it is empty/None.
|
|
"""
|
|
if not text:
|
|
return text
|
|
|
|
# 1. Escape boundary markers so a malicious peer cannot break the
|
|
# trust boundary from inside their response.
|
|
escaped = _escape_boundary_markers(text)
|
|
|
|
# 2. Escape known injection control-words (defense-in-depth only).
|
|
for pattern, replacement in _INJECTION_PATTERNS:
|
|
escaped = pattern.sub(replacement, escaped)
|
|
|
|
# 3. Wrap in trust-boundary markers.
|
|
return f"{_A2A_BOUNDARY_START}\n{escaped}\n{_A2A_BOUNDARY_END}"
|