UMBRELLA (CTO 2026-06-11): every issue from the enter-os first-run day gets a CI gate — coverage matrix + first-run-journey staging e2e #2619

Open
opened 2026-06-12 01:20:06 +00:00 by core-devops · 0 comments
Member

CTO directive: "every issue we found should all be in our CI testing pipeline including e2e." This umbrella tracks the full 2026-06-11 haul to gated coverage. SOP #765 (fix ⇒ regression test) applied retroactively to the day.

Coverage matrix

# Issue / fix Unit/CI today E2E gap → spec
1 #2573 platform-root restart guard (#2603, merged) handler tests (platform skip, fan-out SQL pin, fail-closed) ☐ J1: secret write+delete on the concierge → workspace stays online (live-tested by hand today; automate in journey)
2 mcp-server#63 duplicate tool reg killed mgmt-mode boot throwing-mock + composed-server tests; image smoke gate (template#112) live covered — smoke gate IS the e2e (first real run caught #63)
3 template#112 bash-quoting silently masked the smoke gate smoke now actually executes ☐ cheap lint: bash -n over extracted run: blocks in publish-image.yml so a future quoting bug fails the workflow, not the gate's meaning
4 CP#730 tunnel-ingress drift (tabs-empty) (merged+deployed) redeployer unit tests; cf-tunnel-drift code gate ☐ J2: after tenant redeploy, tunnel_ingress_synced=true AND /requests/pending returns JSON through the real edge (HTML = the exact regression)
5 cp#729 SM secret-deletion race (silent provision fail) FIX OPEN fix + unit (retry/backoff on scheduled for deletion) + J3 exercises an image-change restart cycle
6 #2609 orphan parenting (#2610, in review) create-default + backfill tests ☐ J4: concierge-provisioned workspace has parent_id == org root and a delegation to it completes
7 #2611 double-provision race + terminal register-401 FIX OPEN fix + units (single-flight recreate; bootstrap gate reopens when zero live tokens) + J5: exactly ONE instance per workspace; register 200; proof-of-life = agent_log/heartbeat within deadline, not status=online (today's core lesson: online lies)
8 #2608 BYOK-unsatisfiable create (#2617, in review) 422-no-insert, global-cred passes, platform-id query-free, discovery payload ☐ J6 negative: bare claude id on fresh org → 422 MISSING_BYOK_CREDENTIAL; J6 positive: offered-models lookup → moonshot/kimi-k2.6 → keyless provision → online
9 #2614 respond → requester A2A turn (in review) notify routing/idem/body tests ☐ J7: user responds Done → requester agent's queue/activity shows the delivered turn

The carrier: first_run_journey_test.go (staginge2e)

One gated journey in the EXISTING harness (reuse adminCreateOrg / tenantAdminToken / tenantCreateWorkspace / teardown — no new plumbing): fresh org → platform-agent install → J6 negative → J6 positive → J4 → J5 proof-of-life → J1 → create_request to user → respond → J7 → J2 → purge. This is exactly the journey a real first user (enter-os, 2026-06-11) executed and broke four different ways; it becomes the acceptance test for #2611 and cp#729 fixes.

Wire as a job in e2e-staging-saas.yml next to the concierge suite. Gating posture per the no-flakes rule: any intermittent failure gets a named mechanism or stays red — no advisory parking.

Sequencing

  1. Journey skeleton with J1/J2/J4/J6/J7 (all fixes merged or in review now) — buildable immediately.
  2. #2611 + cp#729 fixes land WITH their units; J3/J5 flip on with them.
  3. template lint (item 3) — one workflow step, independent.

Refs: #2573 #2603 #2605 #2608 #2609 #2610 #2611 #2614 #2617, mcp-server#62/#63, template#111/#112/#113, cp#729/#730.

CTO directive: **"every issue we found should all be in our CI testing pipeline including e2e."** This umbrella tracks the full 2026-06-11 haul to gated coverage. SOP #765 (fix ⇒ regression test) applied retroactively to the day. ## Coverage matrix | # | Issue / fix | Unit/CI today | E2E gap → spec | |---|---|---|---| | 1 | #2573 platform-root restart guard (#2603, merged) | ✅ handler tests (platform skip, fan-out SQL pin, fail-closed) | ☐ J1: secret write+delete on the concierge → workspace stays `online` (live-tested by hand today; automate in journey) | | 2 | mcp-server#63 duplicate tool reg killed mgmt-mode boot | ✅ throwing-mock + composed-server tests; ✅ image smoke gate (template#112) live | covered — smoke gate IS the e2e (first real run caught #63) | | 3 | template#112 bash-quoting silently masked the smoke gate | ✅ smoke now actually executes | ☐ cheap lint: `bash -n` over extracted `run:` blocks in publish-image.yml so a future quoting bug fails the workflow, not the gate's meaning | | 4 | CP#730 tunnel-ingress drift (tabs-empty) (merged+deployed) | ✅ redeployer unit tests; ✅ cf-tunnel-drift code gate | ☐ J2: after tenant redeploy, `tunnel_ingress_synced=true` AND `/requests/pending` returns JSON through the real edge (HTML = the exact regression) | | 5 | cp#729 SM secret-deletion race (silent provision fail) | ❌ FIX OPEN | fix + unit (retry/backoff on `scheduled for deletion`) + J3 exercises an image-change restart cycle | | 6 | #2609 orphan parenting (#2610, in review) | ✅ create-default + backfill tests | ☐ J4: concierge-provisioned workspace has `parent_id == org root` and a delegation to it completes | | 7 | #2611 double-provision race + terminal register-401 | ❌ FIX OPEN | fix + units (single-flight recreate; bootstrap gate reopens when zero live tokens) + J5: exactly ONE instance per workspace; register 200; **proof-of-life = agent_log/heartbeat within deadline, not `status=online`** (today's core lesson: online lies) | | 8 | #2608 BYOK-unsatisfiable create (#2617, in review) | ✅ 422-no-insert, global-cred passes, platform-id query-free, discovery payload | ☐ J6 negative: bare claude id on fresh org → 422 MISSING_BYOK_CREDENTIAL; J6 positive: `offered-models` lookup → `moonshot/kimi-k2.6` → keyless provision → online | | 9 | #2614 respond → requester A2A turn (in review) | ✅ notify routing/idem/body tests | ☐ J7: user responds Done → requester agent's queue/activity shows the delivered turn | ## The carrier: `first_run_journey_test.go` (staginge2e) One gated journey in the EXISTING harness (reuse `adminCreateOrg` / `tenantAdminToken` / `tenantCreateWorkspace` / teardown — no new plumbing): fresh org → platform-agent install → J6 negative → J6 positive → J4 → J5 proof-of-life → J1 → create_request to user → respond → J7 → J2 → purge. This is exactly the journey a real first user (enter-os, 2026-06-11) executed and broke four different ways; it becomes the acceptance test for #2611 and cp#729 fixes. Wire as a job in `e2e-staging-saas.yml` next to the concierge suite. Gating posture per the no-flakes rule: any intermittent failure gets a named mechanism or stays red — no advisory parking. ## Sequencing 1. Journey skeleton with J1/J2/J4/J6/J7 (all fixes merged or in review now) — buildable immediately. 2. #2611 + cp#729 fixes land WITH their units; J3/J5 flip on with them. 3. template lint (item 3) — one workflow step, independent. Refs: #2573 #2603 #2605 #2608 #2609 #2610 #2611 #2614 #2617, mcp-server#62/#63, template#111/#112/#113, cp#729/#730.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2619