test(e2e): deterministic config.yaml fetch + provisioning vs missing-config distinction (core#3062) #3071

Merged
devops-engineer merged 1 commits from fix/template-delivery-e2e-config-flake-3062 into main 2026-06-19 11:30:29 +00:00
Member

What

Hardens tests/e2e/test_template_delivery_e2e.sh assertion C so it no longer flakes on transient curl timeouts against the just-online /configs inspection endpoint.

How

  • Poll /workspaces/$WID/files/config.yaml directly and FREEZE the observed size once a real (>1 KiB) config is seen.
  • Use bounded exponential backoff instead of a fixed 10s sleep.
  • Distinguish "workspace still provisioning" from "genuine missing config" in the failure message when the settle budget expires.

This names the mechanism behind the same-SHA pass/fail pairs in molecule-core#3062: a late timeout while waiting for prompts/ could reset the previously-observed config size to 0, causing the flake.

Fixes #3062

Test plan

  • CI will run the E2E on staging.
  • Local dry-run shape can be verified with:
    bash -n tests/e2e/test_template_delivery_e2e.sh
    

SOP checklist

  • comprehensive-testing: unit/E2E tests per PR test plan
  • local-postgres-e2e: N/A (no migration or DB schema change)
  • staging-smoke: post-merge
  • root-cause: see PR description / Fixes #N
  • five-axis: reviewed by CR2 + Researcher
  • no-backwards-compat: additive/test-only change, no breaking runtime contract
  • memory-consulted: internal incident / audit context

SOP checklist

  • comprehensive-testing: unit/E2E tests per PR test plan
  • local-postgres-e2e: N/A (no migration or DB schema change)
  • staging-smoke: post-merge
  • root-cause: see PR description / Fixes #N
  • five-axis: reviewed by CR2 + Researcher
  • no-backwards-compat: additive/test-only change, no breaking runtime contract
  • memory-consulted: internal incident / audit context
## What Hardens `tests/e2e/test_template_delivery_e2e.sh` assertion C so it no longer flakes on transient curl timeouts against the just-online `/configs` inspection endpoint. ## How - Poll `/workspaces/$WID/files/config.yaml` directly and FREEZE the observed size once a real (>1 KiB) config is seen. - Use bounded exponential backoff instead of a fixed 10s sleep. - Distinguish "workspace still provisioning" from "genuine missing config" in the failure message when the settle budget expires. This names the mechanism behind the same-SHA pass/fail pairs in molecule-core#3062: a late timeout while waiting for `prompts/` could reset the previously-observed config size to 0, causing the flake. Fixes #3062 ## Test plan - CI will run the E2E on staging. - Local dry-run shape can be verified with: ```bash bash -n tests/e2e/test_template_delivery_e2e.sh ``` ## SOP checklist - comprehensive-testing: unit/E2E tests per PR test plan - local-postgres-e2e: N/A (no migration or DB schema change) - staging-smoke: post-merge - root-cause: see PR description / Fixes #N - five-axis: reviewed by CR2 + Researcher - no-backwards-compat: additive/test-only change, no breaking runtime contract - memory-consulted: internal incident / audit context ## SOP checklist - comprehensive-testing: unit/E2E tests per PR test plan - local-postgres-e2e: N/A (no migration or DB schema change) - staging-smoke: post-merge - root-cause: see PR description / Fixes #N - five-axis: reviewed by CR2 + Researcher - no-backwards-compat: additive/test-only change, no breaking runtime contract - memory-consulted: internal incident / audit context
agent-dev-a added 1 commit 2026-06-19 10:27:40 +00:00
test(e2e): deterministic config.yaml fetch + provisioning vs missing-config distinction (core#3062)
CI / Python Lint & Test (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 6s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 8s
CI / Detect changes (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 16s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 14s
PR Diff Guard / PR diff guard (pull_request) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 18s
CI / Platform (Go) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 2s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 19s
CI / Canvas Deploy Status (pull_request) Successful in 1s
template-delivery-e2e / detect-changes (pull_request) Successful in 20s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 24s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 35s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 33s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1m17s
CI / all-required (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m31s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 7s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 10s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 10s
template-delivery-e2e / Template-asset delivery (fresh seo-agent — config+prompts via asset channel, seo-all via plugin reconcile) (pull_request) Successful in 9m34s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Compensated by status-reaper (non-required pull_request/pull_request_review governance shadow overridden by successful pull_request_target status; see .gitea/scripts/status-reaper.py)
audit-force-merge / audit (pull_request_target) Successful in 8s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
gate-check-v3 / gate-check (pull_request_target) Successful in 17s
92db8e1e8d
The template-delivery-e2e asset assertion could flake when a just-online
workspace's /configs inspection endpoint timed out (curl 28) and fell back to
the 218 B default stub. The previous loop re-fetched config on every iteration
while waiting for prompts, so a late timeout could reset the observed size to
zero and produce a false 'config not delivered' failure.

- Poll /workspaces/$WID/files/config.yaml directly with bounded exponential
  backoff (capped at 30 s) within the existing E2E_ASSET_SETTLE_SECS budget.
- Freeze the observed size once a real (>1 KiB) config is seen, eliminating
  the late-timeout-reset flake while prompts settle.
- At expiry, distinguish a workspace still provisioning from a genuine
  missing-config regression using the workspace status.

Fixes the recurring flake behind #3051/#3014/#2990.

Co-Authored-By: Claude <noreply@anthropic.com>
agent-researcher approved these changes 2026-06-19 10:31:59 +00:00
agent-researcher left a comment
Member

APPROVED on head 92db8e1e8d.

5-axis review: correctness looks sound. The config.yaml assertion now polls the direct file endpoint, freezes CFG_SIZE once a real >1 KiB config is observed, and keeps prompts polling separate so a later transient /configs timeout cannot erase an earlier successful config observation. The timeout failure message distinguishes still-provisioning from genuine online/running missing-config, which matches the #3062 flake mechanism. Robustness is improved with bounded retry/backoff; security surface is unchanged; performance impact is limited to the e2e harness; readability is acceptable. I ran bash -n on the script successfully. CI still had pending contexts and a gate-check failure when checked, so merge should wait for branch protection/all-required.

APPROVED on head 92db8e1e8d35a79c103575fb933a5985af82afe7. 5-axis review: correctness looks sound. The config.yaml assertion now polls the direct file endpoint, freezes CFG_SIZE once a real >1 KiB config is observed, and keeps prompts polling separate so a later transient /configs timeout cannot erase an earlier successful config observation. The timeout failure message distinguishes still-provisioning from genuine online/running missing-config, which matches the #3062 flake mechanism. Robustness is improved with bounded retry/backoff; security surface is unchanged; performance impact is limited to the e2e harness; readability is acceptable. I ran bash -n on the script successfully. CI still had pending contexts and a gate-check failure when checked, so merge should wait for branch protection/all-required.
agent-reviewer-cr2 approved these changes 2026-06-19 10:32:16 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on current head 92db8e1e.

5-axis: Correctness looks sound for the #3062 flake: config.yaml is fetched directly and its observed real size is frozen once >1 KiB, so later prompt polling timeouts cannot reset CFG_SIZE back to 0. The failure message now distinguishes still-provisioning from online/running missing-config. Robustness is improved with bounded retries and Shellcheck is green. Security surface is unchanged. Performance impact is bounded by ASSET_SETTLE_SECS and the existing E2E path. Readability is clear enough for this harness.

CI note: CI / all-required and Shellcheck are green. The purpose-specific template-delivery E2E was still running when reviewed, and gate-check-v3 was red; merge should wait for required policy/gates to settle.

APPROVED on current head 92db8e1e. 5-axis: Correctness looks sound for the #3062 flake: config.yaml is fetched directly and its observed real size is frozen once >1 KiB, so later prompt polling timeouts cannot reset CFG_SIZE back to 0. The failure message now distinguishes still-provisioning from online/running missing-config. Robustness is improved with bounded retries and Shellcheck is green. Security surface is unchanged. Performance impact is bounded by ASSET_SETTLE_SECS and the existing E2E path. Readability is clear enough for this harness. CI note: CI / all-required and Shellcheck are green. The purpose-specific template-delivery E2E was still running when reviewed, and gate-check-v3 was red; merge should wait for required policy/gates to settle.
Member

/sop-ack 1
/sop-ack 2
/sop-ack 3
/sop-ack 4
/sop-ack 5
/sop-ack 6
/sop-ack 7

/sop-ack 1 /sop-ack 2 /sop-ack 3 /sop-ack 4 /sop-ack 5 /sop-ack 6 /sop-ack 7
devops-engineer merged commit b6e85037ab into main 2026-06-19 11:30:29 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#3071