fix(a2a): restore TTL cache check in enrich_peer_metadata_nonblocking
Some checks failed
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10m34s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 19s
Check migration collisions / Migration version collision check (pull_request) Successful in 28s
qa-review / approved (pull_request) Failing after 15s
CI / Detect changes (pull_request) Successful in 30s
E2E API Smoke Test / detect-changes (pull_request) Successful in 32s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 29s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 34s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
security-review / approved (pull_request) Failing after 16s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 36s
gate-check-v3 / gate-check (pull_request) Failing after 32s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 36s
sop-checklist-gate / gate (pull_request) Successful in 14s
Harness Replays / Harness Replays (pull_request) Successful in 5s
sop-tier-check / tier-check (pull_request) Successful in 13s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 47s
CI / Shellcheck (E2E scripts) (pull_request) Failing after 18s
MCP Stdio Transport Regression / MCP stdio with regular-file stdout (pull_request) Failing after 1m35s
Runtime Pin Compatibility / PyPI-latest install + import smoke (pull_request) Successful in 2m16s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 3m24s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3m42s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m23s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Successful in 5m33s
CI / Platform (Go) (pull_request) Failing after 5m49s
CI / Canvas (Next.js) (pull_request) Failing after 6m36s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 7m29s

The stdio-fallback branch removed the cache-first check from
enrich_peer_metadata_nonblocking, causing 5 tests to fail:

  test_envelope_enrichment_uses_cache_when_present
  test_envelope_enrichment_fetches_on_cache_miss
  test_envelope_enrichment_re_fetches_after_ttl
  test_enrich_peer_metadata_nonblocking_cache_hit_returns_immediately
  test_enrich_peer_metadata_nonblocking_cache_miss_schedules_fetch

The removed lines checked the peer metadata cache (TTL-bounded) and
returned immediately on a cache hit. Without this, every push for a
known peer schedules a background fetch — a performance regression
and a deviation from the documented contract (PR #2484).

This patch restores the cache check to the exact original logic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Molecule AI · infra-runtime-be 2026-05-13 11:09:54 +00:00
parent bdce95663d
commit c2325f1a17

View File

@ -187,11 +187,19 @@ def enrich_peer_metadata_nonblocking(
canon = _validate_peer_id(peer_id)
if canon is None:
return None
# Schedule background fetch unless one is already in flight for this
# peer. The synchronous version atomically reads-then-writes; the
# async version splits that into "schedule fetch" + "fetch fills
# cache later." The in-flight set keeps a flurry of pushes from
# one peer (e.g., a chatty agent) from spawning N parallel GETs.
# Cache hit (fresh): return without blocking on a registry GET.
# This is the hot path for active peer conversations — avoids
# spawning a background thread for every push from a known peer.
current = time.monotonic()
cached = _peer_metadata_get(canon)
if cached is not None:
fetched_at, record = cached
if current - fetched_at < _PEER_METADATA_TTL_SECONDS:
return record
# Cache miss or TTL expired: schedule background fetch unless one is
# already in flight for this peer. The in-flight set keeps a flurry
# of pushes from one peer (e.g., a chatty agent) from spawning N
# parallel GETs.
with _enrich_in_flight_lock:
if canon in _enrich_in_flight:
return None