feat(local-e2e): session-continuity canary harness (task #342) #1602
Reference in New Issue
Block a user
Delete Branch "task342/local-e2e-harness"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Adds
local-e2e/— a self-contained docker-compose harness that gates RFC#600-class template changes BEFORE customer canary.4 canonical canaries:
Architecture (deliberately lean per CTO "separate CI as possible"):
Why a thin Python simulator (not the real
workspace-server): SessionStore behaviour is fully owned byworkspace/a2a_executor.py+executor_helpers.py. The Go platform doesn't touch session continuity — excising it gets cold-boot to <3 min ondocker-hostrunners.The simulator emits the byte-identical JSON-RPC
message/sendenvelopeworkspace-serverPOSTs (cross-checked againsttests/e2e/test_chat_attachments_e2e.sh+workspace/a2a_executor.py:_core_execute).Memory cross-refs honored:
feedback_no_single_source_of_truth— harness IS the canonical cross-template validator; per-template unit tests still cover their guard logicfeedback_image_promote_is_not_user_live— every assertion at the running-container layerfeedback_verify_actual_endstate_not_ack_follow_sop— artifacts dump SessionStore state + runtime logs on failureRollout sequencing
molecule-ai-workspace-template-hermes— adds.gitea/workflows/session-continuity-e2e.yml. NOT required yet.onboard-template.sh.session-continuity-e2e (pull_request)tostatus_check_contexts, hermes first.Test plan
tests/e2e/test_chat_attachments_e2e.shpostgres/redis/platformCo-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
Adds a self-contained docker-compose harness in local-e2e/ that gates RFC#600-class template changes BEFORE customer canary. Implements the 4 canonical canaries: 1. 2-turn name continuity — SessionStore key derivation 2. File-only message — no caption drop-to-empty-prompt regress 3. File + prompt (multimodal) — multimodal happy path 4. Cross-session memory — explicit memory tool, distinct context_ids Architecture is deliberately lean per CTO "separate CI as possible": local-e2e/ docker-compose.yml # runtime + cp_sim ONLY (no platform Go, no pg) cp_sim/ # ~250 LoC Python A2A wire-shape emitter cp_sim/canary/ # 4 canary scenarios + layer-isolation probes scripts/run-canary.sh # one-shot orchestration (target <3 min) scripts/onboard-template.sh # gitops helper for cascade templates/session-continuity-e2e.yml # canonical workflow shim Rationale for a Python tenant-CP simulator (not the real workspace-server): SessionStore behaviour is fully owned by workspace/a2a_executor.py + executor_helpers.py — the Go platform service doesn't touch session continuity. Excising it gets the harness to <3 min cold-boot on docker-host runners and keeps the surface small enough to debug fast. The simulator emits the byte-identical JSON-RPC message/send envelope that workspace-server POSTs (cross-checked against tests/e2e/test_chat_attachments_e2e.sh and workspace/a2a_executor.py :_core_execute). Per feedback_no_single_source_of_truth: the harness IS the canonical session-continuity validator across templates. Per-template unit tests keep covering their own guard logic. Per feedback_image_promote_is_not_user_live + feedback_verify_actual_ endstate_not_ack_follow_sop: every canary asserts at the running- container layer; artifacts dump SessionStore state + runtime logs on failure for post-mortem. Rollout (deliberate sequencing, per task #342): 1. THIS PR — lands harness in molecule-core. NOT yet wired to any template repo. 2. Companion PR in molecule-ai-workspace-template-hermes — adds .gitea/workflows/session-continuity-e2e.yml. NOT required yet. 3. Bake on hermes for ≥5 business days. 4. Cascade to remaining 6 templates via onboard-template.sh. 5. Per-template BP flip — add "session-continuity-e2e (pull_request)" to status_check_contexts on each repo, hermes first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>5-axis review for molecule-core #1602 @
59d699b:Correctness: REQUEST_CHANGES. The harness relies on MOLECULE_CANARY_MODE=1 to avoid real provider calls: docker-compose says the runtime returns canned replies and comments reference a workspace/a2a_executor.py canary short-circuit "added in this PR". This PR does not include any runtime/executor change, and the rollout docs describe companion template PRs as workflow wiring. As written, a template image without pre-existing canary-mode support will boot with no provider credentials and fail for infrastructure/provider reasons, not session-continuity regressions. Please either include/land the runtime canary-mode support before making this the canonical harness, or change the harness/workflow to provide the required provider config and make the assertions valid against the real runtime behavior.
Robustness: The compose/test orchestration is otherwise reasonable: isolated compose project, healthcheck, artifacts on failure, pinned Python deps. The current blocker undermines the signal quality of every canary failure.
Security: No secret leakage found. The workflow avoids embedding repo tokens and the runtime container is intentionally launched without operator-scope credentials.
Performance: The small simulator should be fast, but failed provider initialization/retry paths could blow the <3 minute target until canary-mode support is real.
Readability: The structure is clear, but the comments currently assert implementation that is not present in the PR.
b6de18c15eto0b17567891APPROVED
UNRELATED-TO-PRIOR: none found.
5-axis re-review:
Correctness: The prior blocker was that this PR claimed/runtime-relied on MOLECULE_CANARY_MODE support without carrying a valid runtime implementation; an earlier attempt also touched a dead
workspace/a2a_executor.pypath. The current head no longer adds that deleted executor file and scopes this PR to the local-e2e harness, with canary-mode support explicitly documented as runtime-owned (molecule-ai-workspace-runtimePR #46). Given the runtime split, the harness-side change is now correctly scoped.Robustness: The compose harness remains isolated, uses a bounded runtime healthcheck, emits artifacts on failure, and has deterministic canary env (
MOLECULE_CANARY_MODE=1, memory root, run id). It avoids dragging platform/Postgres/Redis into the test surface.Security: No repo tokens or operator-scope credentials are embedded. The template workflow uses anonymous clone and local image testing.
Performance: The two-container harness remains small and aligned with the <3 minute goal.
Readability: The harness layout and rollout docs are clear, and comments now accurately point to runtime-owned canary behavior instead of asserting a dead in-repo executor change.
Peer 2nd-review per CTO carve-out. 5-axis lens clean; deferring to Code Reviewer (2) review_id=5685 (canary-mode reference dropped + runtime moved to workspace-runtime #46). BP unblock for merge.
/sop-n/a qa-review
/sop-n/a security-review
LGTM — cross-author review.