refactor(memory): remove flush_memories entirely (#15696)

The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
This commit is contained in:
Teknium 2026-04-25 08:21:14 -07:00 committed by GitHub
parent d635e2df3f
commit ea01bdcebe
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
23 changed files with 78 additions and 1567 deletions

View File

@ -1680,9 +1680,9 @@ def build_anthropic_kwargs(
# ── Strip sampling params on 4.7+ ─────────────────────────────────
# Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
# Callers (auxiliary_client, flush_memories, etc.) may set these for
# older models; drop them here as a safety net so upstream 4.6 → 4.7
# migrations don't require coordinated edits everywhere.
# Callers (auxiliary_client, etc.) may set these for older models;
# drop them here as a safety net so upstream 4.6 → 4.7 migrations
# don't require coordinated edits everywhere.
if _forbids_sampling_params(model):
for _sampling_key in ("temperature", "top_p", "top_k"):
kwargs.pop(_sampling_key, None)

View File

@ -390,7 +390,7 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for flush_memories and similar callers
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
tools = kwargs.get("tools")
if tools:
converted = []
@ -2803,8 +2803,8 @@ def _build_call_kwargs(
temperature = fixed_temperature
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
# drop here so auxiliary callers that hardcode temperature (e.g. 0 on
# structured-JSON extraction) don't 400 the moment
# the aux model is flipped to 4.7.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
@ -2892,7 +2892,7 @@ def call_llm(
Args:
task: Auxiliary task name ("compression", "vision", "web_extract",
"session_search", "skills_hub", "mcp", "flush_memories").
"session_search", "skills_hub", "mcp", "title_generation").
Reads provider:model from config/env. Ignored if provider is set.
provider: Explicit provider override.
model: Explicit model override.

10
cli.py
View File

@ -4668,10 +4668,6 @@ class HermesCLI:
def new_session(self, silent=False):
"""Start a fresh session with a new session ID and cleared agent state."""
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
@ -10788,12 +10784,6 @@ class HermesCLI:
self.agent.interrupt()
except Exception:
pass
# Flush memories before exit (only for substantial conversations)
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Shut down voice recorder (release persistent audio stream)
if hasattr(self, '_voice_recorder') and self._voice_recorder:
try:

View File

@ -524,7 +524,7 @@ def _load_gateway_config() -> dict:
def _resolve_gateway_model(config: dict | None = None) -> str:
"""Read model from config.yaml — single source of truth.
Without this, temporary AIAgent instances (memory flush, /compress) fall
Without this, temporary AIAgent instances (e.g. /compress) fall
back to the hardcoded default which fails when the active provider is
openai-codex.
"""
@ -915,129 +915,6 @@ class GatewayRunner:
e,
)
# -----------------------------------------------------------------
def _flush_memories_for_session(
self,
old_session_id: str,
session_key: Optional[str] = None,
):
"""Prompt the agent to save memories/skills before context is lost.
Synchronous worker meant to be called via run_in_executor from
an async context so it doesn't block the event loop.
"""
# Skip cron sessions — they run headless with no meaningful user
# conversation to extract memories from.
if old_session_id and old_session_id.startswith("cron_"):
logger.debug("Skipping memory flush for cron session: %s", old_session_id)
return
try:
history = self.session_store.load_transcript(old_session_id)
if not history or len(history) < 4:
return
from run_agent import AIAgent
model, runtime_kwargs = self._resolve_session_agent_runtime(
session_key=session_key,
)
if not runtime_kwargs.get("api_key"):
return
tmp_agent = AIAgent(
**runtime_kwargs,
model=model,
max_iterations=8,
quiet_mode=True,
skip_memory=True, # Flush agent — no memory provider
enabled_toolsets=["memory", "skills"],
session_id=old_session_id,
)
try:
# Fully silence the flush agent — quiet_mode only suppresses init
# messages; tool call output still leaks to the terminal through
# _safe_print → _print_fn. Set a no-op to prevent that.
tmp_agent._print_fn = lambda *a, **kw: None
# Build conversation history from transcript
msgs = [
{"role": m.get("role"), "content": m.get("content")}
for m in history
if m.get("role") in ("user", "assistant") and m.get("content")
]
# Read live memory state from disk so the flush agent can see
# what's already saved and avoid overwriting newer entries.
_current_memory = ""
try:
from tools.memory_tool import get_memory_dir
_mem_dir = get_memory_dir()
for fname, label in [
("MEMORY.md", "MEMORY (your personal notes)"),
("USER.md", "USER PROFILE (who the user is)"),
]:
fpath = _mem_dir / fname
if fpath.exists():
content = fpath.read_text(encoding="utf-8").strip()
if content:
_current_memory += f"\n\n## Current {label}:\n{content}"
except Exception:
pass # Non-fatal — flush still works, just without the guard
# Give the agent a real turn to think about what to save
flush_prompt = (
"[System: This session is about to be automatically reset due to "
"inactivity or a scheduled daily reset. The conversation context "
"will be cleared after this turn.\n\n"
"Review the conversation above and:\n"
"1. Save any important facts, preferences, or decisions to memory "
"(user profile or your notes) that would be useful in future sessions.\n"
"2. If you discovered a reusable workflow or solved a non-trivial "
"problem, consider saving it as a skill.\n"
"3. If nothing is worth saving, that's fine — just skip.\n\n"
)
if _current_memory:
flush_prompt += (
"IMPORTANT — here is the current live state of memory. Other "
"sessions, cron jobs, or the user may have updated it since this "
"conversation ended. Do NOT overwrite or remove entries unless "
"the conversation above reveals something that genuinely "
"supersedes them. Only add new information that is not already "
"captured below."
f"{_current_memory}\n\n"
)
flush_prompt += (
"Do NOT respond to the user. Just use the memory and skill_manage "
"tools if needed, then stop.]"
)
tmp_agent.run_conversation(
user_message=flush_prompt,
conversation_history=msgs,
)
finally:
self._cleanup_agent_resources(tmp_agent)
logger.info("Pre-reset memory flush completed for session %s", old_session_id)
except Exception as e:
logger.debug("Pre-reset memory flush failed for session %s: %s", old_session_id, e)
async def _async_flush_memories(
self,
old_session_id: str,
session_key: Optional[str] = None,
):
"""Run the sync memory flush in a thread pool so it won't block the event loop."""
loop = asyncio.get_running_loop()
await loop.run_in_executor(
None,
self._flush_memories_for_session,
old_session_id,
session_key,
)
@property
def should_exit_cleanly(self) -> bool:
return self._exit_cleanly
@ -2272,7 +2149,7 @@ class GatewayRunner:
except Exception as e:
logger.error("Recovered watcher setup error: %s", e)
# Start background session expiry watcher for proactive memory flushing
# Start background session expiry watcher to finalize expired sessions
asyncio.create_task(self._session_expiry_watcher())
# Start background reconnection watcher for platforms that failed at startup
@ -2289,25 +2166,24 @@ class GatewayRunner:
return True
async def _session_expiry_watcher(self, interval: int = 300):
"""Background task that proactively flushes memories for expired sessions.
Runs every `interval` seconds (default 5 min). For each session that
has expired according to its reset policy, flushes memories in a thread
pool and marks the session so it won't be flushed again.
"""Background task that finalizes expired sessions.
This means memories are already saved by the time the user sends their
next message, so there's no blocking delay.
Runs every ``interval`` seconds (default 5 min). For each session
whose reset policy has expired, invokes ``on_session_finalize``
hooks, cleans up the cached AIAgent's tool resources, evicts the
cache entry so it can be garbage-collected, and marks the session
so it won't be finalized again.
"""
await asyncio.sleep(60) # initial delay — let the gateway fully start
_flush_failures: dict[str, int] = {} # session_id -> consecutive failure count
_MAX_FLUSH_RETRIES = 3
_finalize_failures: dict[str, int] = {} # session_id -> consecutive failure count
_MAX_FINALIZE_RETRIES = 3
while self._running:
try:
self.session_store._ensure_loaded()
# Collect expired sessions first, then log a single summary.
_expired_entries = []
for key, entry in list(self.session_store._entries.items()):
if entry.memory_flushed:
if entry.expiry_finalized:
continue
if not self.session_store._is_session_expired(entry):
continue
@ -2325,13 +2201,12 @@ class GatewayRunner:
f"{p}:{c}" for p, c in sorted(_platforms.items())
)
logger.info(
"Session expiry: %d sessions to flush (%s)",
"Session expiry: %d sessions to finalize (%s)",
len(_expired_entries), _plat_summary,
)
for key, entry in _expired_entries:
try:
await self._async_flush_memories(entry.session_id, key)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_parts = key.split(":")
@ -2363,48 +2238,48 @@ class GatewayRunner:
# be garbage-collected. Otherwise the cache grows
# unbounded across the gateway's lifetime.
self._evict_cached_agent(key)
# Mark as flushed and persist to disk so the flag
# Mark as finalized and persist to disk so the flag
# survives gateway restarts.
with self.session_store._lock:
entry.memory_flushed = True
entry.expiry_finalized = True
self.session_store._save()
logger.debug(
"Memory flush completed for session %s",
"Session expiry finalized for %s",
entry.session_id,
)
_flush_failures.pop(entry.session_id, None)
_finalize_failures.pop(entry.session_id, None)
except Exception as e:
failures = _flush_failures.get(entry.session_id, 0) + 1
_flush_failures[entry.session_id] = failures
if failures >= _MAX_FLUSH_RETRIES:
failures = _finalize_failures.get(entry.session_id, 0) + 1
_finalize_failures[entry.session_id] = failures
if failures >= _MAX_FINALIZE_RETRIES:
logger.warning(
"Memory flush gave up after %d attempts for %s: %s. "
"Marking as flushed to prevent infinite retry loop.",
"Session finalize gave up after %d attempts for %s: %s. "
"Marking as finalized to prevent infinite retry loop.",
failures, entry.session_id, e,
)
with self.session_store._lock:
entry.memory_flushed = True
entry.expiry_finalized = True
self.session_store._save()
_flush_failures.pop(entry.session_id, None)
_finalize_failures.pop(entry.session_id, None)
else:
logger.debug(
"Memory flush failed (%d/%d) for %s: %s",
failures, _MAX_FLUSH_RETRIES, entry.session_id, e,
"Session finalize failed (%d/%d) for %s: %s",
failures, _MAX_FINALIZE_RETRIES, entry.session_id, e,
)
if _expired_entries:
_flushed = sum(
1 for _, e in _expired_entries if e.memory_flushed
_done = sum(
1 for _, e in _expired_entries if e.expiry_finalized
)
_failed = len(_expired_entries) - _flushed
_failed = len(_expired_entries) - _done
if _failed:
logger.info(
"Session expiry done: %d flushed, %d pending retry",
_flushed, _failed,
"Session expiry done: %d finalized, %d pending retry",
_done, _failed,
)
else:
logger.info(
"Session expiry done: %d flushed", _flushed,
"Session expiry done: %d finalized", _done,
)
# Sweep agents that have been idle beyond the TTL regardless
@ -5021,19 +4896,11 @@ class GatewayRunner:
# Get existing session key
session_key = self._session_key_for_source(source)
self._invalidate_session_run_generation(session_key, reason="session_reset")
# Flush memories in the background (fire-and-forget) so the user
# gets the "Session reset!" response immediately.
try:
old_entry = self.session_store._entries.get(session_key)
if old_entry:
_flush_task = asyncio.create_task(
self._async_flush_memories(old_entry.session_id, session_key)
)
self._background_tasks.add(_flush_task)
_flush_task.add_done_callback(self._background_tasks.discard)
except Exception as e:
logger.debug("Gateway memory flush on reset failed: %s", e)
# Snapshot the old entry so on_session_finalize can report the
# expiring session id before reset_session() rotates it.
old_entry = self.session_store._entries.get(session_key)
# Close tool resources on the old agent (terminal sandboxes, browser
# daemons, background processes) before evicting from cache.
# Guard with getattr because test fixtures may skip __init__.
@ -7252,16 +7119,6 @@ class GatewayRunner:
if current_entry.session_id == target_id:
return f"📌 Already on session **{name}**."
# Flush memories for current session before switching
try:
_flush_task = asyncio.create_task(
self._async_flush_memories(current_entry.session_id, session_key)
)
self._background_tasks.add(_flush_task)
_flush_task.add_done_callback(self._background_tasks.discard)
except Exception as e:
logger.debug("Memory flush on resume failed: %s", e)
# Clear any running agent for this session key
self._release_running_agent_state(session_key)

View File

@ -439,11 +439,11 @@ class SessionEntry:
auto_reset_reason: Optional[str] = None # "idle" or "daily"
reset_had_activity: bool = False # whether the expired session had any messages
# Set by the background expiry watcher after it successfully flushes
# memories for this session. Persisted to sessions.json so the flag
# survives gateway restarts (the old in-memory _pre_flushed_sessions
# set was lost on restart, causing redundant re-flushes).
memory_flushed: bool = False
# Set by the background expiry watcher after it finalizes an expired
# session (invoking on_session_finalize hooks and evicting the cached
# agent). Persisted to sessions.json so the flag survives gateway
# restarts — prevents redundant finalization runs.
expiry_finalized: bool = False
# When True the next call to get_or_create_session() will auto-reset
# this session (create a new session_id) so the user starts fresh.
@ -479,7 +479,7 @@ class SessionEntry:
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
"memory_flushed": self.memory_flushed,
"expiry_finalized": self.expiry_finalized,
"suspended": self.suspended,
"resume_pending": self.resume_pending,
"resume_reason": self.resume_reason,
@ -531,7 +531,7 @@ class SessionEntry:
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
memory_flushed=data.get("memory_flushed", False),
expiry_finalized=data.get("expiry_finalized", data.get("memory_flushed", False)),
suspended=data.get("suspended", False),
resume_pending=data.get("resume_pending", False),
resume_reason=data.get("resume_reason"),

View File

@ -612,14 +612,6 @@ DEFAULT_CONFIG = {
"timeout": 30,
"extra_body": {},
},
"flush_memories": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
"extra_body": {},
},
"title_generation": {
"provider": "auto",
"model": "",

View File

@ -1707,7 +1707,6 @@ _AUX_TASKS: list[tuple[str, str, str]] = [
("session_search", "Session search", "past-conversation recall"),
("approval", "Approval", "smart command approval"),
("mcp", "MCP", "MCP tool reasoning"),
("flush_memories", "Flush memories", "memory consolidation"),
("title_generation", "Title generation", "session titles"),
("skills_hub", "Skills hub", "skills search/install"),
]

View File

@ -43,7 +43,7 @@ _TIMEOUT = 30.0
# ---------------------------------------------------------------------------
# Process-level atexit safety net — ensures pending sessions are committed
# even if shutdown_memory_provider is never called (e.g. gateway crash,
# SIGKILL, or exception in _async_flush_memories preventing shutdown).
# SIGKILL, or exception in the session expiry watcher preventing shutdown).
# ---------------------------------------------------------------------------
_last_active_provider: Optional["OpenVikingMemoryProvider"] = None

View File

@ -1578,7 +1578,6 @@ class AIAgent:
self._memory_enabled = False
self._user_profile_enabled = False
self._memory_nudge_interval = 10
self._memory_flush_min_turns = 6
self._turns_since_memory = 0
self._iters_since_skill = 0
if not skip_memory:
@ -1587,7 +1586,6 @@ class AIAgent:
self._memory_enabled = mem_config.get("memory_enabled", False)
self._user_profile_enabled = mem_config.get("user_profile_enabled", False)
self._memory_nudge_interval = int(mem_config.get("nudge_interval", 10))
self._memory_flush_min_turns = int(mem_config.get("flush_min_turns", 6))
if self._memory_enabled or self._user_profile_enabled:
from tools.memory_tool import MemoryStore
self._memory_store = MemoryStore(
@ -2427,23 +2425,12 @@ class AIAgent:
# above guarantees aux_context >= MINIMUM_CONTEXT_LENGTH,
# so the new threshold is always >= 64K.
#
# Headroom: the threshold budgets RAW MESSAGES only, but the
# actual request auxiliary callers send also includes the
# system prompt and every tool schema. With 50+ tools that
# overhead can be 25-30K tokens; setting new_threshold =
# aux_context directly would let messages grow right to the
# aux limit and the first compression/flush request would
# overflow with HTTP 400. Subtract a dynamic headroom
# estimate so the full request still fits.
from agent.model_metadata import estimate_request_tokens_rough
tool_overhead = estimate_request_tokens_rough([], tools=self.tools)
# System prompt is not yet built at __init__ time; allow a
# conservative 10K budget (SOUL/AGENTS.md + memory snapshot +
# skills guidance) plus 2K for the flush instruction and a
# small safety margin.
headroom = tool_overhead + 12_000
# The compression summariser sends a single user-role
# prompt (no system prompt, no tools) to the aux model, so
# new_threshold == aux_context is safe: the request is
# the raw messages plus a small summarisation instruction.
old_threshold = threshold
new_threshold = max(aux_context - headroom, MINIMUM_CONTEXT_LENGTH)
new_threshold = aux_context
self.context_compressor.threshold_tokens = new_threshold
# Keep threshold_percent in sync so future main-model
# context_length changes (update_model) re-derive from a
@ -7927,254 +7914,6 @@ class AIAgent:
"""
return self.api_mode != "codex_responses"
def flush_memories(self, messages: list = None, min_turns: int = None):
"""Give the model one turn to persist memories before context is lost.
Called before compression, session reset, or CLI exit. Injects a flush
message, makes one API call, executes any memory tool calls, then
strips all flush artifacts from the message list.
Args:
messages: The current conversation messages. If None, uses
self._session_messages (last run_conversation state).
min_turns: Minimum user turns required to trigger the flush.
None = use config value (flush_min_turns).
0 = always flush (used for compression).
"""
if self._memory_flush_min_turns == 0 and min_turns is None:
return
if "memory" not in self.valid_tool_names or not self._memory_store:
return
effective_min = min_turns if min_turns is not None else self._memory_flush_min_turns
if self._user_turn_count < effective_min:
return
if messages is None:
messages = getattr(self, '_session_messages', None)
if not messages or len(messages) < 3:
return
flush_content = (
"[System: The session is being compressed. "
"Save anything worth remembering — prioritize user preferences, "
"corrections, and recurring patterns over task-specific details.]"
)
_sentinel = f"__flush_{id(self)}_{time.monotonic()}"
flush_msg = {"role": "user", "content": flush_content, "_flush_sentinel": _sentinel}
messages.append(flush_msg)
try:
# Build API messages for the flush call
_needs_sanitize = self._should_sanitize_tool_calls()
api_messages = []
for msg in messages:
api_msg = msg.copy()
self._copy_reasoning_content_for_api(msg, api_msg)
api_msg.pop("reasoning", None)
api_msg.pop("finish_reason", None)
api_msg.pop("_flush_sentinel", None)
api_msg.pop("_thinking_prefill", None)
if _needs_sanitize:
self._sanitize_tool_calls_for_strict_api(api_msg)
api_messages.append(api_msg)
if self._cached_system_prompt:
api_messages = [{"role": "system", "content": self._cached_system_prompt}] + api_messages
# Make one API call with only the memory tool available
memory_tool_def = None
for t in (self.tools or []):
if t.get("function", {}).get("name") == "memory":
memory_tool_def = t
break
if not memory_tool_def:
messages.pop() # remove flush msg
return
# Use auxiliary client for the flush call when available --
# it's cheaper and avoids Codex Responses API incompatibility.
from agent.auxiliary_client import (
call_llm as _call_llm,
_fixed_temperature_for_model,
OMIT_TEMPERATURE,
)
_aux_available = True
# Kimi models manage temperature server-side — omit it entirely.
# Other models with a fixed contract get that value; everyone else
# gets the historical 0.3 default.
_fixed_temp = _fixed_temperature_for_model(self.model, self.base_url)
_omit_temperature = _fixed_temp is OMIT_TEMPERATURE
if _omit_temperature:
_flush_temperature = None
elif _fixed_temp is not None:
_flush_temperature = _fixed_temp
else:
_flush_temperature = 0.3
aux_error = None
try:
response = _call_llm(
task="flush_memories",
messages=api_messages,
tools=[memory_tool_def],
temperature=_flush_temperature,
max_tokens=5120,
# timeout resolved from auxiliary.flush_memories.timeout config
)
except Exception as e:
aux_error = e
_aux_available = False
response = None
if not _aux_available and self.api_mode == "codex_responses":
# No auxiliary client -- use the Codex Responses path directly.
# The Responses API does not accept `temperature` on any
# supported backend (chatgpt.com/backend-api/codex rejects it
# outright; api.openai.com + gpt-5/o-series reasoning models
# and Copilot Responses reject it on reasoning models). The
# transport intentionally never sets it — strip any leftover
# here so the flush fallback matches the main-loop behavior.
codex_kwargs = self._build_api_kwargs(api_messages)
_ct_flush = self._get_transport()
if _ct_flush is not None:
codex_kwargs["tools"] = _ct_flush.convert_tools([memory_tool_def])
elif not codex_kwargs.get("tools"):
codex_kwargs["tools"] = [memory_tool_def]
codex_kwargs.pop("temperature", None)
if "max_output_tokens" in codex_kwargs:
codex_kwargs["max_output_tokens"] = 5120
response = self._run_codex_stream(codex_kwargs)
elif not _aux_available and self.api_mode == "anthropic_messages":
# Native Anthropic — use the transport for kwargs
_tflush = self._get_transport()
ant_kwargs = _tflush.build_kwargs(
model=self.model, messages=api_messages,
tools=[memory_tool_def], max_tokens=5120,
reasoning_config=None,
preserve_dots=self._anthropic_preserve_dots(),
)
response = self._anthropic_messages_create(ant_kwargs)
elif not _aux_available:
api_kwargs = {
"model": self.model,
"messages": api_messages,
"tools": [memory_tool_def],
**self._max_tokens_param(5120),
}
if _flush_temperature is not None:
api_kwargs["temperature"] = _flush_temperature
from agent.auxiliary_client import _get_task_timeout
response = self._ensure_primary_openai_client(reason="flush_memories").chat.completions.create(
**api_kwargs, timeout=_get_task_timeout("flush_memories")
)
if aux_error is not None:
logger.warning("Auxiliary memory flush failed; used fallback path: %s", aux_error)
self._emit_auxiliary_failure("memory flush", aux_error)
def _openai_tool_calls(resp):
if resp is not None and hasattr(resp, "choices") and resp.choices:
msg = getattr(resp.choices[0], "message", None)
calls = getattr(msg, "tool_calls", None)
if calls:
return calls
return []
def _codex_output_tool_calls(resp):
calls = []
for item in getattr(resp, "output", []) or []:
if getattr(item, "type", None) == "function_call":
calls.append(SimpleNamespace(
id=getattr(item, "call_id", None),
type="function",
function=SimpleNamespace(
name=getattr(item, "name", ""),
arguments=getattr(item, "arguments", "{}"),
),
))
return calls
# Extract tool calls from the response, handling all API formats
tool_calls = []
if self.api_mode == "codex_responses" and not _aux_available:
_ct_flush = self._get_transport()
_cnr_flush = _ct_flush.normalize_response(response) if _ct_flush is not None else None
if _cnr_flush and _cnr_flush.tool_calls:
tool_calls = [
SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
) for tc in _cnr_flush.tool_calls
]
else:
tool_calls = _codex_output_tool_calls(response)
elif self.api_mode == "anthropic_messages" and not _aux_available:
_tfn = self._get_transport()
_flush_result = _tfn.normalize_response(response, strip_tool_prefix=self._is_anthropic_oauth)
if _flush_result and _flush_result.tool_calls:
tool_calls = [
SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
) for tc in _flush_result.tool_calls
]
elif self.api_mode in ("chat_completions", "bedrock_converse"):
# chat_completions / bedrock — normalize through transport
_tfn = self._get_transport()
_flush_result = _tfn.normalize_response(response) if _tfn is not None else None
if _flush_result and _flush_result.tool_calls:
tool_calls = _flush_result.tool_calls
else:
tool_calls = _openai_tool_calls(response)
elif _aux_available and hasattr(response, "choices") and response.choices:
# Auxiliary client returned OpenAI-shaped response while main
# api_mode is codex/anthropic — extract tool_calls from .choices
tool_calls = _openai_tool_calls(response)
for tc in tool_calls:
if tc.function.name == "memory":
try:
args = json.loads(tc.function.arguments)
flush_target = args.get("target", "memory")
from tools.memory_tool import memory_tool as _memory_tool
_memory_tool(
action=args.get("action"),
target=flush_target,
content=args.get("content"),
old_text=args.get("old_text"),
store=self._memory_store,
)
if self._memory_manager and args.get("action") in ("add", "replace"):
try:
self._memory_manager.on_memory_write(
args.get("action", ""),
flush_target,
args.get("content", ""),
metadata=self._build_memory_write_metadata(
write_origin="memory_flush",
execution_context="flush_memories",
),
)
except Exception:
pass
if not self.quiet_mode:
print(f" 🧠 Memory flush: saved to {args.get('target', 'memory')}")
except Exception as e:
logger.warning("Memory flush tool call failed: %s", e)
self._emit_auxiliary_failure("memory flush tool", e)
except Exception as e:
logger.warning("Memory flush API call failed: %s", e)
self._emit_auxiliary_failure("memory flush", e)
finally:
# Strip flush artifacts: remove everything from the flush message onward.
# Use sentinel marker instead of identity check for robustness.
while messages and messages[-1].get("_flush_sentinel") != _sentinel:
messages.pop()
if not messages:
break
if messages and messages[-1].get("_flush_sentinel") == _sentinel:
messages.pop()
def _compress_context(self, messages: list, system_message: str, *, approx_tokens: int = None, task_id: str = "default", focus_topic: str = None) -> tuple:
"""Compress conversation context and split the session in SQLite.
@ -8193,8 +7932,6 @@ class AIAgent:
f"{approx_tokens:,}" if approx_tokens else "unknown", self.model,
focus_topic,
)
# Pre-compression memory flush: let the model save memories before they're lost
self.flush_memories(messages, min_turns=0)
# Notify external memory provider before compression discards context
if self._memory_manager:

View File

@ -386,7 +386,7 @@ class TestProvidersDictApiModeAnthropicMessages:
},
},
"auxiliary": {
"flush_memories": {
"compression": {
"provider": "myrelay",
"model": "claude-sonnet-4.6",
},
@ -399,11 +399,11 @@ class TestProvidersDictApiModeAnthropicMessages:
AnthropicAuxiliaryClient,
AsyncAnthropicAuxiliaryClient,
)
async_client, async_model = get_async_text_auxiliary_client("flush_memories")
async_client, async_model = get_async_text_auxiliary_client("compression")
assert isinstance(async_client, AsyncAnthropicAuxiliaryClient)
assert async_model == "claude-sonnet-4.6"
sync_client, sync_model = get_text_auxiliary_client("flush_memories")
sync_client, sync_model = get_text_auxiliary_client("compression")
assert isinstance(sync_client, AnthropicAuxiliaryClient)
assert sync_model == "claude-sonnet-4.6"

View File

@ -1,7 +1,7 @@
"""Regression tests for the universal "unsupported temperature" retry in
``agent.auxiliary_client``.
Auxiliary callers (``flush_memories``, context compression, session search,
Auxiliary callers (context compression, session search,
web extract summarisation, etc.) hardcode ``temperature=0.3`` for historical
reasons. Several provider/model combinations reject ``temperature`` with a
400:
@ -100,7 +100,7 @@ class TestCallLlmUnsupportedTemperatureRetry:
side_effect=lambda resp, _task: resp),
):
result = call_llm(
task="flush_memories",
task="compression",
messages=[{"role": "user", "content": "remember this"}],
temperature=0.3,
max_tokens=500,
@ -136,7 +136,7 @@ class TestCallLlmUnsupportedTemperatureRetry:
):
with pytest.raises(RuntimeError, match="Invalid value"):
call_llm(
task="flush_memories",
task="compression",
messages=[{"role": "user", "content": "x"}],
temperature=0.3,
max_tokens=500,
@ -166,7 +166,7 @@ class TestCallLlmUnsupportedTemperatureRetry:
):
with pytest.raises(RuntimeError):
call_llm(
task="flush_memories",
task="compression",
messages=[{"role": "user", "content": "x"}],
temperature=None, # explicit: no temperature sent
max_tokens=500,

View File

@ -33,7 +33,6 @@ class _FakeAgent:
self._todo_store.write(
[{"id": "t1", "content": "unfinished task", "status": "in_progress"}]
)
self.flush_memories = MagicMock()
self.commit_memory_session = MagicMock()
self._invalidate_system_prompt = MagicMock()
@ -157,7 +156,6 @@ def test_new_command_creates_real_fresh_session_and_resets_agent_state(tmp_path)
assert cli.agent._todo_store.read() == []
assert cli.session_start > old_session_start
assert cli.agent.session_start == cli.session_start
cli.agent.flush_memories.assert_called_once_with([{"role": "user", "content": "hello"}])
cli.agent._invalidate_system_prompt.assert_called_once()

View File

@ -1,249 +0,0 @@
"""Tests for proactive memory flush on session expiry.
Verifies that:
1. _is_session_expired() works from a SessionEntry alone (no source needed)
2. The sync callback is no longer called in get_or_create_session
3. memory_flushed flag persists across save/load cycles (prevents restart re-flush)
4. The background watcher can detect expired sessions
"""
import pytest
from datetime import datetime, timedelta
from pathlib import Path
from unittest.mock import patch, MagicMock
from gateway.config import Platform, GatewayConfig, SessionResetPolicy
from gateway.session import SessionSource, SessionStore, SessionEntry
@pytest.fixture()
def idle_store(tmp_path):
"""SessionStore with a 60-minute idle reset policy."""
config = GatewayConfig(
default_reset_policy=SessionResetPolicy(mode="idle", idle_minutes=60),
)
with patch("gateway.session.SessionStore._ensure_loaded"):
s = SessionStore(sessions_dir=tmp_path, config=config)
s._db = None
s._loaded = True
return s
@pytest.fixture()
def no_reset_store(tmp_path):
"""SessionStore with no reset policy (mode=none)."""
config = GatewayConfig(
default_reset_policy=SessionResetPolicy(mode="none"),
)
with patch("gateway.session.SessionStore._ensure_loaded"):
s = SessionStore(sessions_dir=tmp_path, config=config)
s._db = None
s._loaded = True
return s
class TestIsSessionExpired:
"""_is_session_expired should detect expiry from entry alone."""
def test_idle_session_expired(self, idle_store):
entry = SessionEntry(
session_key="agent:main:telegram:dm",
session_id="sid_1",
created_at=datetime.now() - timedelta(hours=3),
updated_at=datetime.now() - timedelta(minutes=120),
platform=Platform.TELEGRAM,
chat_type="dm",
)
assert idle_store._is_session_expired(entry) is True
def test_active_session_not_expired(self, idle_store):
entry = SessionEntry(
session_key="agent:main:telegram:dm",
session_id="sid_2",
created_at=datetime.now() - timedelta(hours=1),
updated_at=datetime.now() - timedelta(minutes=10),
platform=Platform.TELEGRAM,
chat_type="dm",
)
assert idle_store._is_session_expired(entry) is False
def test_none_mode_never_expires(self, no_reset_store):
entry = SessionEntry(
session_key="agent:main:telegram:dm",
session_id="sid_3",
created_at=datetime.now() - timedelta(days=30),
updated_at=datetime.now() - timedelta(days=30),
platform=Platform.TELEGRAM,
chat_type="dm",
)
assert no_reset_store._is_session_expired(entry) is False
def test_active_processes_prevent_expiry(self, idle_store):
"""Sessions with active background processes should never expire."""
idle_store._has_active_processes_fn = lambda key: True
entry = SessionEntry(
session_key="agent:main:telegram:dm",
session_id="sid_4",
created_at=datetime.now() - timedelta(hours=5),
updated_at=datetime.now() - timedelta(hours=5),
platform=Platform.TELEGRAM,
chat_type="dm",
)
assert idle_store._is_session_expired(entry) is False
def test_daily_mode_expired(self, tmp_path):
"""Daily mode should expire sessions from before today's reset hour."""
config = GatewayConfig(
default_reset_policy=SessionResetPolicy(mode="daily", at_hour=4),
)
with patch("gateway.session.SessionStore._ensure_loaded"):
store = SessionStore(sessions_dir=tmp_path, config=config)
store._db = None
store._loaded = True
entry = SessionEntry(
session_key="agent:main:telegram:dm",
session_id="sid_5",
created_at=datetime.now() - timedelta(days=2),
updated_at=datetime.now() - timedelta(days=2),
platform=Platform.TELEGRAM,
chat_type="dm",
)
assert store._is_session_expired(entry) is True
class TestGetOrCreateSessionNoCallback:
"""get_or_create_session should NOT call a sync flush callback."""
def test_auto_reset_creates_new_session_after_flush(self, idle_store):
"""When a flushed session auto-resets, a new session_id is created."""
source = SessionSource(
platform=Platform.TELEGRAM,
chat_id="123",
chat_type="dm",
)
# Create initial session
entry1 = idle_store.get_or_create_session(source)
old_sid = entry1.session_id
# Simulate the watcher having flushed it
entry1.memory_flushed = True
# Simulate the session going idle
entry1.updated_at = datetime.now() - timedelta(minutes=120)
idle_store._save()
# Next call should auto-reset
entry2 = idle_store.get_or_create_session(source)
assert entry2.session_id != old_sid
assert entry2.was_auto_reset is True
# New session starts with memory_flushed=False
assert entry2.memory_flushed is False
def test_no_sync_callback_invoked(self, idle_store):
"""No synchronous callback should block during auto-reset."""
source = SessionSource(
platform=Platform.TELEGRAM,
chat_id="123",
chat_type="dm",
)
entry1 = idle_store.get_or_create_session(source)
entry1.updated_at = datetime.now() - timedelta(minutes=120)
idle_store._save()
# Verify no _on_auto_reset attribute
assert not hasattr(idle_store, '_on_auto_reset')
# This should NOT block (no sync LLM call)
entry2 = idle_store.get_or_create_session(source)
assert entry2.was_auto_reset is True
class TestMemoryFlushedFlag:
"""The memory_flushed flag on SessionEntry prevents double-flushing."""
def test_defaults_to_false(self):
entry = SessionEntry(
session_key="agent:main:telegram:dm:123",
session_id="sid_new",
created_at=datetime.now(),
updated_at=datetime.now(),
platform=Platform.TELEGRAM,
chat_type="dm",
)
assert entry.memory_flushed is False
def test_persists_through_save_load(self, idle_store):
"""memory_flushed=True must survive a save/load cycle (simulates restart)."""
key = "agent:main:discord:thread:789"
entry = SessionEntry(
session_key=key,
session_id="sid_flushed",
created_at=datetime.now() - timedelta(hours=5),
updated_at=datetime.now() - timedelta(hours=5),
platform=Platform.DISCORD,
chat_type="thread",
memory_flushed=True,
)
idle_store._entries[key] = entry
idle_store._save()
# Simulate restart: clear in-memory state, reload from disk
idle_store._entries.clear()
idle_store._loaded = False
idle_store._ensure_loaded()
reloaded = idle_store._entries[key]
assert reloaded.memory_flushed is True
def test_unflushed_entry_survives_restart_as_unflushed(self, idle_store):
"""An entry without memory_flushed stays False after reload."""
key = "agent:main:telegram:dm:456"
entry = SessionEntry(
session_key=key,
session_id="sid_not_flushed",
created_at=datetime.now() - timedelta(hours=2),
updated_at=datetime.now() - timedelta(hours=2),
platform=Platform.TELEGRAM,
chat_type="dm",
)
idle_store._entries[key] = entry
idle_store._save()
idle_store._entries.clear()
idle_store._loaded = False
idle_store._ensure_loaded()
reloaded = idle_store._entries[key]
assert reloaded.memory_flushed is False
def test_roundtrip_to_dict_from_dict(self):
"""to_dict/from_dict must preserve memory_flushed."""
entry = SessionEntry(
session_key="agent:main:telegram:dm:999",
session_id="sid_rt",
created_at=datetime.now(),
updated_at=datetime.now(),
platform=Platform.TELEGRAM,
chat_type="dm",
memory_flushed=True,
)
d = entry.to_dict()
assert d["memory_flushed"] is True
restored = SessionEntry.from_dict(d)
assert restored.memory_flushed is True
def test_legacy_entry_without_field_defaults_false(self):
"""Old sessions.json entries missing memory_flushed should default to False."""
data = {
"session_key": "agent:main:telegram:dm:legacy",
"session_id": "sid_legacy",
"created_at": datetime.now().isoformat(),
"updated_at": datetime.now().isoformat(),
"platform": "telegram",
"chat_type": "dm",
# no memory_flushed key
}
entry = SessionEntry.from_dict(data)
assert entry.memory_flushed is False

View File

@ -1,240 +0,0 @@
"""Tests for memory flush stale-overwrite prevention (#2670).
Verifies that:
1. Cron sessions are skipped (no flush for headless cron runs)
2. Current memory state is injected into the flush prompt so the
flush agent can see what's already saved and avoid overwrites
3. The flush still works normally when memory files don't exist
"""
import sys
import types
import pytest
from pathlib import Path
from unittest.mock import MagicMock, patch, call
@pytest.fixture(autouse=True)
def _mock_dotenv(monkeypatch):
"""gateway.run imports dotenv at module level; stub it so tests run without the package."""
fake = types.ModuleType("dotenv")
fake.load_dotenv = lambda *a, **kw: None
monkeypatch.setitem(sys.modules, "dotenv", fake)
def _make_runner():
from gateway.run import GatewayRunner
runner = object.__new__(GatewayRunner)
runner._honcho_managers = {}
runner._honcho_configs = {}
runner._running_agents = {}
runner._pending_messages = {}
runner._pending_approvals = {}
runner.adapters = {}
runner.hooks = MagicMock()
runner.session_store = MagicMock()
return runner
_TRANSCRIPT_4_MSGS = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hi there"},
{"role": "user", "content": "remember my name is Alice"},
{"role": "assistant", "content": "Got it, Alice!"},
]
class TestCronSessionBypass:
"""Cron sessions should never trigger a memory flush."""
def test_cron_session_skipped(self):
runner = _make_runner()
runner._flush_memories_for_session("cron_job123_20260323_120000")
# session_store.load_transcript should never be called
runner.session_store.load_transcript.assert_not_called()
def test_cron_session_with_prefix_skipped(self):
"""Cron sessions with different prefixes are still skipped."""
runner = _make_runner()
runner._flush_memories_for_session("cron_daily_20260323")
runner.session_store.load_transcript.assert_not_called()
def test_non_cron_session_proceeds(self):
"""Non-cron sessions should still attempt the flush."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = []
runner._flush_memories_for_session("session_abc123")
runner.session_store.load_transcript.assert_called_once_with("session_abc123")
def _make_flush_context(monkeypatch, memory_dir=None):
"""Return (runner, tmp_agent, fake_run_agent) with run_agent mocked in sys.modules."""
tmp_agent = MagicMock()
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = MagicMock(return_value=tmp_agent)
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
return runner, tmp_agent, memory_dir
class TestMemoryInjection:
"""The flush prompt should include current memory state from disk."""
def test_memory_content_injected_into_flush_prompt(self, tmp_path, monkeypatch):
"""When memory files exist, their content appears in the flush prompt."""
memory_dir = tmp_path / "memories"
memory_dir.mkdir()
(memory_dir / "MEMORY.md").write_text("Agent knows Python\n§\nUser prefers dark mode")
(memory_dir / "USER.md").write_text("Name: Alice\n§\nTimezone: PST")
runner, tmp_agent, _ = _make_flush_context(monkeypatch, memory_dir)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(get_memory_dir=lambda: memory_dir)}),
):
runner._flush_memories_for_session("session_123")
tmp_agent.run_conversation.assert_called_once()
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "Agent knows Python" in flush_prompt
assert "User prefers dark mode" in flush_prompt
assert "Name: Alice" in flush_prompt
assert "Timezone: PST" in flush_prompt
assert "Do NOT overwrite or remove entries" in flush_prompt
assert "current live state of memory" in flush_prompt
def test_flush_works_without_memory_files(self, tmp_path, monkeypatch):
"""When no memory files exist, flush still runs without the guard."""
empty_dir = tmp_path / "no_memories"
empty_dir.mkdir()
runner, tmp_agent, _ = _make_flush_context(monkeypatch)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(get_memory_dir=lambda: empty_dir)}),
):
runner._flush_memories_for_session("session_456")
tmp_agent.run_conversation.assert_called_once()
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "Do NOT overwrite or remove entries" not in flush_prompt
assert "Review the conversation above" in flush_prompt
def test_empty_memory_files_no_injection(self, tmp_path, monkeypatch):
"""Empty memory files should not trigger the guard section."""
memory_dir = tmp_path / "memories"
memory_dir.mkdir()
(memory_dir / "MEMORY.md").write_text("")
(memory_dir / "USER.md").write_text(" \n ") # whitespace only
runner, tmp_agent, _ = _make_flush_context(monkeypatch)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(get_memory_dir=lambda: memory_dir)}),
):
runner._flush_memories_for_session("session_789")
tmp_agent.run_conversation.assert_called_once()
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "current live state of memory" not in flush_prompt
class TestFlushAgentSilenced:
"""The flush agent must not produce any terminal output."""
def test_print_fn_set_to_noop(self, tmp_path, monkeypatch):
"""_print_fn on the flush agent must be a no-op so tool output never leaks."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
captured_agent = {}
def _fake_ai_agent(*args, **kwargs):
agent = MagicMock()
captured_agent["instance"] = agent
return agent
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = _fake_ai_agent
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(get_memory_dir=lambda: tmp_path)}),
):
runner._flush_memories_for_session("session_silent")
agent = captured_agent["instance"]
assert agent._print_fn is not None, "_print_fn should be overridden to suppress output"
# Confirm it is callable and produces no output (no exception)
agent._print_fn("should be silenced")
def test_kawaii_spinner_respects_print_fn(self):
"""KawaiiSpinner must route all output through print_fn when supplied."""
from agent.display import KawaiiSpinner
written = []
spinner = KawaiiSpinner("test", print_fn=lambda *a, **kw: written.append(a))
spinner._write("hello")
assert written == [("hello",)], "spinner should route through print_fn"
# A no-op print_fn must produce no output to stdout
import io, sys
buf = io.StringIO()
old_stdout = sys.stdout
sys.stdout = buf
try:
silent_spinner = KawaiiSpinner("silent", print_fn=lambda *a, **kw: None)
silent_spinner._write("should not appear")
silent_spinner.stop("done")
finally:
sys.stdout = old_stdout
assert buf.getvalue() == "", "no-op print_fn spinner must not write to stdout"
def test_flush_agent_closes_resources_after_run(self, monkeypatch):
"""Memory flush should close temporary agent resources after the turn."""
runner, tmp_agent, _ = _make_flush_context(monkeypatch)
tmp_agent.shutdown_memory_provider = MagicMock()
tmp_agent.close = MagicMock()
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(get_memory_dir=lambda: Path("/nonexistent"))}),
):
runner._flush_memories_for_session("session_cleanup")
tmp_agent.shutdown_memory_provider.assert_called_once()
tmp_agent.close.assert_called_once()
class TestFlushPromptStructure:
"""Verify the flush prompt retains its core instructions."""
def test_core_instructions_present(self, monkeypatch):
"""The flush prompt should still contain the original guidance."""
runner, tmp_agent, _ = _make_flush_context(monkeypatch)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(get_memory_dir=lambda: Path("/nonexistent"))}),
):
runner._flush_memories_for_session("session_struct")
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "automatically reset" in flush_prompt
assert "Save any important facts" in flush_prompt
assert "consider saving it as a skill" in flush_prompt
assert "Do NOT respond to the user" in flush_prompt

View File

@ -4,7 +4,7 @@ Tests the _handle_resume_command handler (switch to a previously-named session)
across gateway messenger platforms.
"""
from unittest.mock import MagicMock, AsyncMock
from unittest.mock import MagicMock
import pytest
@ -53,9 +53,6 @@ def _make_runner(session_db=None, current_session_id="current_session_001",
mock_store.switch_session.return_value = mock_session_entry
runner.session_store = mock_store
# Stub out memory flushing
runner._async_flush_memories = AsyncMock()
return runner
@ -233,28 +230,3 @@ class TestHandleResumeCommand:
assert real_key not in runner._running_agents
db.close()
@pytest.mark.asyncio
async def test_resume_flushes_memories(self, tmp_path):
"""Resume should flush memories from the current session before switching."""
from hermes_state import SessionDB
db = SessionDB(db_path=tmp_path / "state.db")
db.create_session("old_session", "telegram")
db.set_session_title("old_session", "Old Work")
db.create_session("current_session_001", "telegram")
event = _make_event(text="/resume Old Work")
runner = _make_runner(
session_db=db,
current_session_id="current_session_001",
event=event,
)
await runner._handle_resume_command(event)
runner._async_flush_memories.assert_called_once_with(
"current_session_001",
"agent:main:telegram:dm:67890",
)
db.close()

View File

@ -177,8 +177,8 @@ async def test_idle_expiry_fires_finalize_hook(mock_invoke_hook):
its reset policy (idle timeout, scheduled reset), it must fire
``on_session_finalize`` so plugin providers get the same final-pass
extraction opportunity they'd get from /new or CLI shutdown. Before
the fix, the expiry path flushed memories and evicted the agent but
silently skipped the hook.
the fix, the expiry path evicted the agent but silently skipped the
hook.
"""
from datetime import datetime, timedelta
@ -200,7 +200,7 @@ async def test_idle_expiry_fires_finalize_hook(mock_invoke_hook):
platform=Platform.TELEGRAM,
chat_type="dm",
)
expired_entry.memory_flushed = False
expired_entry.expiry_finalized = False
runner.session_store = MagicMock()
runner.session_store._ensure_loaded = MagicMock()
@ -211,24 +211,24 @@ async def test_idle_expiry_fires_finalize_hook(mock_invoke_hook):
runner.session_store._lock.__exit__ = MagicMock(return_value=None)
runner.session_store._save = MagicMock()
runner._async_flush_memories = AsyncMock()
runner._evict_cached_agent = MagicMock()
runner._cleanup_agent_resources = MagicMock()
runner._sweep_idle_cached_agents = MagicMock(return_value=0)
# The watcher starts with `await asyncio.sleep(60)` and loops while
# `self._running`. Patch sleep so the 60s initial delay is instant, then
# flip `_running` false inside the flush call so the loop exits cleanly
# after one pass.
# `self._running`. Patch sleep so the 60s initial delay is instant, and
# make the expiry hook invocation flip `_running` false so the loop
# exits cleanly after one pass.
_orig_sleep = __import__("asyncio").sleep
async def _fast_sleep(_):
await _orig_sleep(0)
async def _flush_and_stop(session_id, key):
runner._running = False # terminate the loop after this iteration
def _hook_and_stop(*a, **kw):
runner._running = False
return None
runner._async_flush_memories = AsyncMock(side_effect=_flush_and_stop)
mock_invoke_hook.side_effect = _hook_and_stop
with patch("gateway.run.asyncio.sleep", side_effect=_fast_sleep):
await runner._session_expiry_watcher(interval=0)

View File

@ -1,7 +1,7 @@
"""Regression tests for approval-state cleanup on session boundaries."""
from datetime import datetime
from unittest.mock import AsyncMock, MagicMock
from unittest.mock import MagicMock
import pytest
@ -72,7 +72,6 @@ def _make_resume_runner():
runner = object.__new__(GatewayRunner)
runner.adapters = {}
runner._background_tasks = set()
runner._async_flush_memories = AsyncMock()
runner._running_agents = {}
runner._running_agents_ts = {}
runner._busy_ack_ts = {}

View File

@ -31,7 +31,6 @@ def _make_agent_with_engine(engine):
agent._vprint = lambda *a, **kw: None
agent._last_flushed_db_idx = 0
# Stub the few AIAgent methods _compress_context uses.
agent.flush_memories = lambda *a, **kw: None
agent._invalidate_system_prompt = lambda *a, **kw: None
agent._build_system_prompt = lambda *a, **kw: "new-system-prompt"
agent.commit_memory_session = lambda *a, **kw: None

View File

@ -41,8 +41,6 @@ def _make_agent(
agent.tool_progress_callback = None
agent._compression_warning = None
agent._aux_compression_context_length_config = None
# Tools feed into the headroom calculation in _check_compression_model_feasibility.
# Tests that want to assert specific threshold values can override this.
agent.tools = []
compressor = MagicMock(spec=ContextCompressor)
@ -85,9 +83,8 @@ def test_auto_corrects_threshold_when_aux_context_below_threshold(mock_get_clien
assert "threshold:" in messages[0]
# Warning stored for gateway replay
assert agent._compression_warning is not None
# Threshold on the live compressor was actually lowered, accounting for
# the request-overhead headroom (empty tools list → ~12K headroom only).
assert agent.context_compressor.threshold_tokens == 68_000
# Threshold on the live compressor was actually lowered to aux_context.
assert agent.context_compressor.threshold_tokens == 80_000
@patch("agent.model_metadata.get_model_context_length", return_value=32_768)
@ -346,93 +343,7 @@ def test_just_below_threshold_auto_corrects(mock_get_client, mock_ctx_len):
assert len(messages) == 1
assert "small-model" in messages[0]
assert "Auto-lowered" in messages[0]
assert agent.context_compressor.threshold_tokens == 87_999
# ── Headroom for system prompt + tool schemas ────────────────────────
@patch("agent.model_metadata.get_model_context_length", return_value=128_000)
@patch("agent.auxiliary_client.get_text_auxiliary_client")
def test_auto_lowered_threshold_reserves_headroom_for_tools_and_system(mock_get_client, mock_ctx_len):
"""When aux context binds the threshold, new_threshold must leave room
for the system prompt and tool schemas that auxiliary callers
(compression summariser, flush_memories) prepend to the message list.
Without headroom, a full-budget message window + ~25K system/tool
overhead overflows the aux model with HTTP 400. Regression guard for
the flush_memories-on-busy-toolset overflow path.
"""
# Main context 200K, threshold 70% = 140K. Aux pins at 128K (below
# threshold → triggers auto-correct).
agent = _make_agent(main_context=200_000, threshold_percent=0.70)
# Build a realistic tool schema load.
agent.tools = [
{
"type": "function",
"function": {
"name": f"tool_{i}",
"description": "x" * 200,
"parameters": {"type": "object", "properties": {"arg": {"type": "string", "description": "y" * 120}}},
},
}
for i in range(50)
]
mock_client = MagicMock()
mock_client.base_url = "https://openrouter.ai/api/v1"
mock_client.api_key = "sk-aux"
mock_get_client.return_value = (mock_client, "model-with-128k")
agent._emit_status = lambda msg: None
agent._check_compression_model_feasibility()
new_threshold = agent.context_compressor.threshold_tokens
# Must have strictly reserved headroom: new_threshold < aux_context.
assert new_threshold < 128_000, (
f"threshold {new_threshold} did not reserve headroom below aux=128,000 "
f"— system prompt + tools would overflow the aux model"
)
# Must respect the 64K hard floor.
from agent.model_metadata import MINIMUM_CONTEXT_LENGTH
assert new_threshold >= MINIMUM_CONTEXT_LENGTH
@patch("agent.model_metadata.get_model_context_length", return_value=80_000)
@patch("agent.auxiliary_client.get_text_auxiliary_client")
def test_headroom_floors_at_minimum_context(mock_get_client, mock_ctx_len):
"""If headroom subtraction would push below 64K floor, clamp to 64K
rather than refusing the session the aux is still workable for a
smaller message window.
"""
# Aux at 80K, with enough tools to push headroom > 16K → naive subtract
# would land at < 64K. The max(..., MINIMUM_CONTEXT_LENGTH) clamp must
# keep the session running.
agent = _make_agent(main_context=200_000, threshold_percent=0.50)
agent.tools = [
{
"type": "function",
"function": {
"name": f"tool_{i}",
"description": "z" * 2_000, # fat descriptions
"parameters": {},
},
}
for i in range(30)
]
mock_client = MagicMock()
mock_client.base_url = "https://openrouter.ai/api/v1"
mock_client.api_key = "sk-aux"
mock_get_client.return_value = (mock_client, "small-aux-model")
agent._emit_status = lambda msg: None
agent._check_compression_model_feasibility()
from agent.model_metadata import MINIMUM_CONTEXT_LENGTH
assert agent.context_compressor.threshold_tokens == MINIMUM_CONTEXT_LENGTH
assert agent.context_compressor.threshold_tokens == 99_999
# ── Two-phase: __init__ + run_conversation replay ───────────────────

View File

@ -1,398 +0,0 @@
"""Tests for flush_memories() working correctly across all provider modes.
Catches the bug where Codex mode called chat.completions.create on a
Responses-only client, which would fail silently or with a 404.
"""
import json
import os
import sys
import types
from types import SimpleNamespace
from unittest.mock import patch, MagicMock, call
import pytest
sys.modules.setdefault("fire", types.SimpleNamespace(Fire=lambda *a, **k: None))
sys.modules.setdefault("firecrawl", types.SimpleNamespace(Firecrawl=object))
sys.modules.setdefault("fal_client", types.SimpleNamespace())
import run_agent
class _FakeOpenAI:
def __init__(self, **kwargs):
self.kwargs = kwargs
self.api_key = kwargs.get("api_key", "test")
self.base_url = kwargs.get("base_url", "http://test")
def close(self):
pass
def _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter"):
"""Build an AIAgent with mocked internals, ready for flush_memories testing."""
monkeypatch.setattr(run_agent, "get_tool_definitions", lambda **kw: [
{
"type": "function",
"function": {
"name": "memory",
"description": "Manage memories.",
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string"},
"target": {"type": "string"},
"content": {"type": "string"},
},
},
},
},
])
monkeypatch.setattr(run_agent, "check_toolset_requirements", lambda: {})
monkeypatch.setattr(run_agent, "OpenAI", _FakeOpenAI)
agent = run_agent.AIAgent(
api_key="test-key",
base_url="https://test.example.com/v1",
provider=provider,
api_mode=api_mode,
max_iterations=4,
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
)
# Give it a valid memory store
agent._memory_store = MagicMock()
agent._memory_flush_min_turns = 1
agent._user_turn_count = 5
return agent
def _chat_response_with_memory_call():
"""Simulated chat completions response with a memory tool call."""
return SimpleNamespace(
choices=[SimpleNamespace(
finish_reason="tool_calls",
message=SimpleNamespace(
content=None,
tool_calls=[SimpleNamespace(
id="call_mem_0",
type="function",
function=SimpleNamespace(
name="memory",
arguments=json.dumps({
"action": "add",
"target": "notes",
"content": "User prefers dark mode.",
}),
),
)],
),
)],
usage=SimpleNamespace(prompt_tokens=100, completion_tokens=20, total_tokens=120),
)
class TestFlushMemoriesRespectsConfigTimeout:
"""flush_memories() must NOT hardcode timeout=30.0 — it should defer
to the config value via auxiliary.flush_memories.timeout."""
def test_auxiliary_path_omits_explicit_timeout(self, monkeypatch):
"""When calling _call_llm, timeout should NOT be passed so that
_get_task_timeout('flush_memories') reads from config."""
agent = _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter")
mock_response = _chat_response_with_memory_call()
with patch("agent.auxiliary_client.call_llm", return_value=mock_response) as mock_call:
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "Note this"},
]
with patch("tools.memory_tool.memory_tool", return_value="Saved."):
agent.flush_memories(messages)
mock_call.assert_called_once()
call_kwargs = mock_call.call_args
# timeout must NOT be explicitly passed (so _get_task_timeout resolves it)
assert "timeout" not in call_kwargs.kwargs, (
"flush_memories should not pass explicit timeout to _call_llm; "
"let _get_task_timeout('flush_memories') resolve from config"
)
def test_fallback_path_uses_config_timeout(self, monkeypatch):
"""When auxiliary client is unavailable and we fall back to direct
OpenAI client, timeout should come from _get_task_timeout, not hardcoded."""
agent = _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter")
agent.client = MagicMock()
agent.client.chat.completions.create.return_value = _chat_response_with_memory_call()
custom_timeout = 180.0
with patch("agent.auxiliary_client.call_llm", side_effect=RuntimeError("no provider")), \
patch("agent.auxiliary_client._get_task_timeout", return_value=custom_timeout) as mock_gtt, \
patch("tools.memory_tool.memory_tool", return_value="Saved."):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "Save this"},
]
agent.flush_memories(messages)
mock_gtt.assert_called_once_with("flush_memories")
agent.client.chat.completions.create.assert_called_once()
call_kwargs = agent.client.chat.completions.create.call_args
assert call_kwargs.kwargs.get("timeout") == custom_timeout, (
f"Expected timeout={custom_timeout} from config, got {call_kwargs.kwargs.get('timeout')}"
)
class TestFlushMemoriesUsesAuxiliaryClient:
"""When an auxiliary client is available, flush_memories should use it
instead of self.client -- especially critical in Codex mode."""
def test_flush_uses_auxiliary_when_available(self, monkeypatch):
agent = _make_agent(monkeypatch, api_mode="codex_responses", provider="openai-codex")
mock_response = _chat_response_with_memory_call()
with patch("agent.auxiliary_client.call_llm", return_value=mock_response) as mock_call:
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
{"role": "user", "content": "Remember this"},
]
with patch("tools.memory_tool.memory_tool", return_value="Saved.") as mock_memory:
agent.flush_memories(messages)
mock_call.assert_called_once()
call_kwargs = mock_call.call_args
assert call_kwargs.kwargs.get("task") == "flush_memories"
def test_flush_uses_main_client_when_no_auxiliary(self, monkeypatch):
"""Non-Codex mode with no auxiliary falls back to self.client."""
agent = _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter")
agent.client = MagicMock()
agent.client.chat.completions.create.return_value = _chat_response_with_memory_call()
with patch("agent.auxiliary_client.call_llm", side_effect=RuntimeError("no provider")):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
{"role": "user", "content": "Save this"},
]
with patch("tools.memory_tool.memory_tool", return_value="Saved."):
agent.flush_memories(messages)
agent.client.chat.completions.create.assert_called_once()
def test_auxiliary_provider_failure_surfaces_warning_and_falls_back(self, monkeypatch):
"""Provider/API failures from auxiliary flush must be visible.
Exhausted keys and rate limits are not always RuntimeError. They used
to fall into the broad outer handler and disappear into debug logs.
"""
agent = _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter")
agent.client = MagicMock()
agent.client.chat.completions.create.return_value = _chat_response_with_memory_call()
events = []
agent.status_callback = lambda kind, text=None: events.append((kind, text))
with patch("agent.auxiliary_client.call_llm", side_effect=Exception("opencode-go key exhausted")), \
patch("tools.memory_tool.memory_tool", return_value="Saved."):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
{"role": "user", "content": "Save this"},
]
agent.flush_memories(messages)
agent.client.chat.completions.create.assert_called_once()
assert any(kind == "warn" and "Auxiliary memory flush failed" in text for kind, text in events)
def test_flush_executes_memory_tool_calls(self, monkeypatch):
"""Verify that memory tool calls from the flush response actually get executed."""
agent = _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter")
mock_response = _chat_response_with_memory_call()
with patch("agent.auxiliary_client.call_llm", return_value=mock_response):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "Note this"},
]
with patch("tools.memory_tool.memory_tool", return_value="Saved.") as mock_memory:
agent.flush_memories(messages)
mock_memory.assert_called_once()
call_kwargs = mock_memory.call_args
assert call_kwargs.kwargs["action"] == "add"
assert call_kwargs.kwargs["target"] == "notes"
assert "dark mode" in call_kwargs.kwargs["content"]
def test_flush_bridges_memory_write_metadata(self, monkeypatch):
"""Flush memory writes notify external providers with flush provenance."""
agent = _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter")
agent._memory_manager = MagicMock()
agent.session_id = "sess-flush"
agent.platform = "cli"
mock_response = _chat_response_with_memory_call()
with patch("agent.auxiliary_client.call_llm", return_value=mock_response):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "Note this"},
]
with patch("tools.memory_tool.memory_tool", return_value="Saved."):
agent.flush_memories(messages)
agent._memory_manager.on_memory_write.assert_called_once()
call_kwargs = agent._memory_manager.on_memory_write.call_args
assert call_kwargs.args[:3] == ("add", "notes", "User prefers dark mode.")
assert call_kwargs.kwargs["metadata"]["write_origin"] == "memory_flush"
assert call_kwargs.kwargs["metadata"]["execution_context"] == "flush_memories"
assert call_kwargs.kwargs["metadata"]["session_id"] == "sess-flush"
def test_flush_strips_artifacts_from_messages(self, monkeypatch):
"""After flush, the flush prompt and any response should be removed from messages."""
agent = _make_agent(monkeypatch, api_mode="chat_completions", provider="openrouter")
mock_response = _chat_response_with_memory_call()
with patch("agent.auxiliary_client.call_llm", return_value=mock_response):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "Remember X"},
]
original_len = len(messages)
with patch("tools.memory_tool.memory_tool", return_value="Saved."):
agent.flush_memories(messages)
# Messages should not grow from the flush
assert len(messages) <= original_len
# No flush sentinel should remain
for msg in messages:
assert "_flush_sentinel" not in msg
class TestFlushMemoriesCodexFallback:
"""When no auxiliary client exists and we're in Codex mode, flush should
use the Codex Responses API path instead of chat.completions."""
def test_codex_mode_no_aux_uses_responses_api(self, monkeypatch):
agent = _make_agent(monkeypatch, api_mode="codex_responses", provider="openai-codex")
codex_response = SimpleNamespace(
output=[
SimpleNamespace(
type="function_call",
call_id="call_1",
name="memory",
arguments=json.dumps({
"action": "add",
"target": "notes",
"content": "Codex flush test",
}),
),
],
usage=SimpleNamespace(input_tokens=50, output_tokens=10, total_tokens=60),
status="completed",
model="gpt-5-codex",
)
with patch("agent.auxiliary_client.call_llm", side_effect=RuntimeError("no provider")), \
patch.object(agent, "_run_codex_stream", return_value=codex_response) as mock_stream, \
patch.object(agent, "_build_api_kwargs") as mock_build, \
patch("tools.memory_tool.memory_tool", return_value="Saved.") as mock_memory:
mock_build.return_value = {
"model": "gpt-5-codex",
"instructions": "test",
"input": [],
"tools": [],
"max_output_tokens": 4096,
}
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "Save this"},
]
agent.flush_memories(messages)
mock_stream.assert_called_once()
mock_memory.assert_called_once()
assert mock_memory.call_args.kwargs["content"] == "Codex flush test"
@pytest.mark.parametrize(
"provider,base_url",
[
# chatgpt.com/backend-api/codex — rejects temperature unconditionally
("openai-codex", "https://chatgpt.com/backend-api/codex"),
# Native OpenAI Responses — rejects temperature on gpt-5/o-series reasoning models
("openai", "https://api.openai.com/v1"),
# Copilot Responses — rejects temperature on reasoning models
("copilot", "https://api.githubcopilot.com"),
],
)
def test_codex_fallback_never_sends_temperature(self, monkeypatch, provider, base_url):
"""Regression for the ``⚠ Auxiliary memory flush failed: HTTP 400:
Unsupported parameter: temperature`` error.
The codex_responses fallback must strip temperature before calling
_run_codex_stream the Responses API does not accept it on any
supported backend, matching the transport's behavior."""
agent = _make_agent(monkeypatch, api_mode="codex_responses", provider=provider)
agent.base_url = base_url
codex_response = SimpleNamespace(
output=[
SimpleNamespace(
type="function_call",
call_id="call_1",
name="memory",
arguments=json.dumps({
"action": "add",
"target": "notes",
"content": "no-temp test",
}),
),
],
usage=SimpleNamespace(input_tokens=50, output_tokens=10, total_tokens=60),
status="completed",
model="gpt-5.5",
)
with patch("agent.auxiliary_client.call_llm", side_effect=RuntimeError("no provider")), \
patch.object(agent, "_run_codex_stream", return_value=codex_response) as mock_stream, \
patch.object(agent, "_build_api_kwargs") as mock_build, \
patch("tools.memory_tool.memory_tool", return_value="Saved."):
# Simulate a transport that (correctly) never includes temperature,
# but also verify we strip any stray temperature the fallback used
# to inject before the fix.
mock_build.return_value = {
"model": "gpt-5.5",
"instructions": "test",
"input": [],
"tools": [],
"max_output_tokens": 4096,
# Intentionally poison the dict to prove we pop it:
"temperature": 0.3,
}
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "Save this"},
]
agent.flush_memories(messages)
mock_stream.assert_called_once()
sent_kwargs = mock_stream.call_args.args[0]
assert "temperature" not in sent_kwargs, (
f"codex_responses fallback must strip temperature before calling "
f"_run_codex_stream, got: {sent_kwargs.get('temperature')!r}"
)

View File

@ -3078,48 +3078,6 @@ class TestRetryExhaustion:
assert "bad messages" in result["error"]
# ---------------------------------------------------------------------------
# Flush sentinel leak
# ---------------------------------------------------------------------------
class TestFlushSentinelNotLeaked:
"""_flush_sentinel must be stripped before sending messages to the API."""
def test_flush_sentinel_stripped_from_api_messages(self, agent_with_memory_tool):
"""Verify _flush_sentinel is not sent to the API provider."""
agent = agent_with_memory_tool
agent._memory_store = MagicMock()
agent._memory_flush_min_turns = 1
agent._user_turn_count = 10
agent._cached_system_prompt = "system"
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hi"},
{"role": "user", "content": "remember this"},
]
# Mock the API to return a simple response (no tool calls)
mock_msg = SimpleNamespace(content="OK", tool_calls=None)
mock_choice = SimpleNamespace(message=mock_msg)
mock_response = SimpleNamespace(choices=[mock_choice])
agent.client.chat.completions.create.return_value = mock_response
# Bypass auxiliary client so flush uses agent.client directly
with patch("agent.auxiliary_client.call_llm", side_effect=RuntimeError("no provider")):
agent.flush_memories(messages, min_turns=0)
# Check what was actually sent to the API
call_args = agent.client.chat.completions.create.call_args
assert call_args is not None, "flush_memories never called the API"
api_messages = call_args.kwargs.get("messages") or call_args[1].get("messages")
for msg in api_messages:
assert "_flush_sentinel" not in msg, (
f"_flush_sentinel leaked to API in message: {msg}"
)
# ---------------------------------------------------------------------------
# Conversation history mutation
# ---------------------------------------------------------------------------

View File

@ -721,14 +721,6 @@ auxiliary:
base_url: ""
api_key: ""
timeout: 30
# Memory flush — summarizes conversation for persistent memory
flush_memories:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30
```
:::tip

View File

@ -168,7 +168,6 @@ Hermes uses separate lightweight models for side tasks. Each task has its own pr
| Session Search | Past session summarization | `auxiliary.session_search` |
| Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
| MCP | MCP helper operations | `auxiliary.mcp` |
| Memory Flush | Memory consolidation | `auxiliary.flush_memories` |
| Approval | Smart command-approval classification | `auxiliary.approval` |
| Title Generation | Session title summaries | `auxiliary.title_generation` |
@ -226,10 +225,6 @@ auxiliary:
mcp:
provider: "auto"
model: ""
flush_memories:
provider: "auto"
model: ""
```
Every task above follows the same **provider / model / base_url** pattern. Context compression is configured under `auxiliary.compression`:
@ -365,7 +360,6 @@ See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configurat
| Session search | Auto-detection chain | `auxiliary.session_search` |
| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
| Memory flush | Auto-detection chain | `auxiliary.flush_memories` |
| Approval classification | Auto-detection chain | `auxiliary.approval` |
| Title generation | Auto-detection chain | `auxiliary.title_generation` |
| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |