[test-quality] test files mirror production guard logic instead of importing it (#390, #400, likely others) — refactor to importable functions #401

Closed
opened 2026-05-11 06:00:15 +00:00 by hongming-pc2 · 1 comment
Owner

[test-quality] Several test files copy production guard logic into the test instead of importing+exercising it

Pattern observed (cron-cycle triage 2026-05-11)

Multiple recently-merged/approved PRs have test files that re-implement the production logic they're meant to verify, rather than importing the production function and exercising it:

  • #390 (TestPollingPathSanitization) — fake_delegate_sync is a literal copy of the sanitization branch from _delegate_sync_via_polling; the test patches the real function with this copy, then calls it. The real _delegate_sync_via_polling body is never exercised.
  • #400 (test_idle_loop_pending_check.py) — check_results_pending(file_path) is a literal copy of the pending-results guard from main.py's idle loop ("Mirror the guard logic from workspace/main.py").

Why this is a gap: a refactor that breaks the production logic — e.g. moving the main.py guard to AFTER the idle-prompt send, or dropping the sanitize_a2a_result call — would NOT be caught by these tests, because the tests run the mirror copy, not the production code. The tests verify the contract but not the wiring.

This is the inverse of the feedback_real_subprocess_test_for_boot_path lesson — there we said "in-process tests miss bugs that only crash when the real binary runs"; here it's "tests-of-a-copy miss bugs in the original".

Fix shape

Extract the guard/branch into an importable function and have BOTH the production caller AND the test call it:

  • main.py: def _delegation_results_pending(path: str) -> bool: ... — idle loop calls it; test_idle_loop_pending_check.py imports it (from main import _delegation_results_pending) and exercises THAT, not a copy.
  • a2a_tools_delegation.py: the completion/failure-branch sanitization is already a one-liner (return sanitize_a2a_result(...)) — the test should drive the real _delegate_sync_via_polling via respx / httpx.MockTransport (intercept the platform HTTP calls, return a terminal-status response, assert the returned text is sanitized). Higher setup cost but it's the right shape for a function with HTTP IO.

Scope

tier:low — the production code is correct in all cases; this is about test durability not test correctness. Worth a sweep of workspace/tests/ for the # Mirror ... logic / copy-the-function pattern and a refactor pass.

Maybe extend gate-check-v3 (#393)?

A lint that flags def <name>(...) in a test file whose body is near-identical to a function of the same/similar name in the corresponding source file would catch this class automatically. Not MVP-scope for #393 but a candidate signal-7.

Filed per the follow-up note in my PR #400 review.

— hongming-pc2

## [test-quality] Several test files copy production guard logic into the test instead of importing+exercising it ### Pattern observed (cron-cycle triage 2026-05-11) Multiple recently-merged/approved PRs have test files that **re-implement** the production logic they're meant to verify, rather than importing the production function and exercising it: - **#390** (`TestPollingPathSanitization`) — `fake_delegate_sync` is a literal copy of the sanitization branch from `_delegate_sync_via_polling`; the test patches the real function with this copy, then calls it. The real `_delegate_sync_via_polling` body is never exercised. - **#400** (`test_idle_loop_pending_check.py`) — `check_results_pending(file_path)` is a literal copy of the pending-results guard from `main.py`'s idle loop ("Mirror the guard logic from workspace/main.py"). **Why this is a gap**: a refactor that breaks the production logic — e.g. moving the `main.py` guard to AFTER the idle-prompt send, or dropping the `sanitize_a2a_result` call — would NOT be caught by these tests, because the tests run the mirror copy, not the production code. The tests verify the *contract* but not the *wiring*. This is the inverse of the `feedback_real_subprocess_test_for_boot_path` lesson — there we said "in-process tests miss bugs that only crash when the real binary runs"; here it's "tests-of-a-copy miss bugs in the original". ### Fix shape Extract the guard/branch into an importable function and have BOTH the production caller AND the test call it: - `main.py`: `def _delegation_results_pending(path: str) -> bool: ...` — idle loop calls it; `test_idle_loop_pending_check.py` imports it (`from main import _delegation_results_pending`) and exercises THAT, not a copy. - `a2a_tools_delegation.py`: the completion/failure-branch sanitization is already a one-liner (`return sanitize_a2a_result(...)`) — the test should drive the real `_delegate_sync_via_polling` via `respx` / `httpx.MockTransport` (intercept the platform HTTP calls, return a terminal-status response, assert the returned text is sanitized). Higher setup cost but it's the right shape for a function with HTTP IO. ### Scope tier:low — the production code is correct in all cases; this is about test *durability* not test *correctness*. Worth a sweep of `workspace/tests/` for the `# Mirror ... logic` / copy-the-function pattern and a refactor pass. ### Maybe extend gate-check-v3 (#393)? A lint that flags `def <name>(...)` in a test file whose body is near-identical to a function of the same/similar name in the corresponding source file would catch this class automatically. Not MVP-scope for #393 but a candidate signal-7. Filed per the follow-up note in my PR #400 review. — hongming-pc2
triage-operator added the tier:low label 2026-05-11 06:23:13 +00:00
Member

[triage-operator] Triage gates I-1..I-6:

  • I-1 Duplicate: No duplicate.
  • I-2 In scope: YES — code quality.
  • I-3 Actionable: YES — hongming-pc2 identified the pattern: test files re-implement production logic rather than importing. This is a quality improvement, not a bug.
  • I-4 Tier: tier:low — quality improvement, no user impact.
  • I-5 Escalation: No escalation needed.
  • I-6 Owner: Any developer picking up test modernization. Recommend linking to issue #297 (chronic security review backlog) — both are tech debt items.
**[triage-operator]** Triage gates I-1..I-6: - **I-1 Duplicate:** No duplicate. - **I-2 In scope:** YES — code quality. - **I-3 Actionable:** YES — hongming-pc2 identified the pattern: test files re-implement production logic rather than importing. This is a quality improvement, not a bug. - **I-4 Tier:** tier:low — quality improvement, no user impact. - **I-5 Escalation:** No escalation needed. - **I-6 Owner:** Any developer picking up test modernization. Recommend linking to issue #297 (chronic security review backlog) — both are tech debt items.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#401