test(e2e): harden staging canvas Playwright suite toward HARD merge-gate
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
Harness Replays / detect-changes (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 11s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
qa-review / approved (pull_request_target) Failing after 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
security-review / approved (pull_request_target) Failing after 5s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 1s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
gate-check-v3 / gate-check (pull_request_target) Successful in 17s
E2E Chat / E2E Chat (pull_request) Successful in 7s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m14s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m11s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m18s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m26s
CI / Canvas (Next.js) (pull_request) Successful in 6m19s
CI / all-required (pull_request) Successful in 3s
CI / Canvas Deploy Status (pull_request) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 11s
audit-force-merge / audit (pull_request_target) Successful in 3s
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
Harness Replays / detect-changes (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 11s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
qa-review / approved (pull_request_target) Failing after 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
security-review / approved (pull_request_target) Failing after 5s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 1s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
gate-check-v3 / gate-check (pull_request_target) Successful in 17s
E2E Chat / E2E Chat (pull_request) Successful in 7s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m14s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m11s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m18s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m26s
CI / Canvas (Next.js) (pull_request) Successful in 6m19s
CI / all-required (pull_request) Successful in 3s
CI / Canvas Deploy Status (pull_request) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 11s
audit-force-merge / audit (pull_request_target) Successful in 3s
Deflake the staging canvas tab E2E so it can become a required check (continue-on-error stays per RFC internal#219 §1 / CTO call — NOT removed). Each flake/weak-gate mechanism is named and fixed deterministically (§ No flakes / internal#828). Does NOT touch staging-display.spec.ts (in-flight PR #2275). staging-tabs.spec.ts: - Weak "container visible" gate shipped empty/errored panels green: the single tabpanel div always mounts. Replaced with assertPanelRendered(): settled REAL content via expect.poll (non-empty, not stuck on a loading spinner) for non-degraded tabs. Mechanism: polled content condition instead of implicit "network finished by now". - ErrorBoundary ("Something went wrong") was never asserted — a React subtree crash passed. Now asserted absent at hydration AND per tab. - Error detection was [role=alert]:has-text("Failed to load") ONLY: missed other error phrasings and role-less error divs (ActivityTab). Replaced with any *visible* alert inside the panel for non-degraded tabs. - Hand-maintained TAB_IDS could drift silently from SidePanel.tsx TABS (it was already stale: missing display + container-config). Added a live-DOM parity guard (fails loud on a new/removed tab); display + container-config explicitly excluded (display owned by PR #2275). - Added click→activation confirmation (aria-selected) before asserting the panel — closes a wrong-panel race on slow click handlers. - Fail-closed: CANVAS_E2E_STAGING=1 with no tenant state now hard-errors (was a silent skip→green path); unset env still skips cleanly. - Added PROMOTION-READINESS block (reliable now / still-blocks-required / checklist). staging-setup.ts: - Fail-closed handoff: empty slug/tenantURL/workspaceId/tenantToken now hard-fails setup naming the missing field, instead of handing off a partial state the spec diagnoses (or skips) downstream. e2e-staging-canvas.yml: - PROMOTION-READINESS comment (what's reliable / what still blocks promotion-to-required). continue-on-error untouched. Verified without live infra: tsc --noEmit clean on all three e2e files; playwright --list collects the staging spec; suite self-skips clean with no STAGING env (exit 0) and hard-errors loud with CANVAS_E2E_STAGING=1 and no token (exit !=0). Full live suite needs staging infra — not run here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -12,9 +12,30 @@ name: E2E Staging Canvas (Playwright)
|
||||
#
|
||||
|
||||
# Playwright test suite that provisions a fresh staging org per run and
|
||||
# verifies every workspace-panel tab renders without crashing. Complements
|
||||
# e2e-staging-saas.yml (which tests the API shape) by exercising the
|
||||
# actual browser + canvas bundle against live staging.
|
||||
# verifies every workspace-panel tab renders REAL content (not just an
|
||||
# empty/errored container). Complements e2e-staging-saas.yml (which tests
|
||||
# the API shape) by exercising the actual browser + canvas bundle against
|
||||
# live staging.
|
||||
#
|
||||
# PROMOTION-READINESS (toward making this a HARD merge-gate):
|
||||
# NOW RELIABLE (spec hardened — staging-tabs.spec.ts):
|
||||
# - All waits condition-based (toBeVisible/toHaveAttribute/expect.poll);
|
||||
# no fixed waitForTimeout in the spec.
|
||||
# - Tabs asserted on settled REAL content, not "container visible".
|
||||
# - ErrorBoundary + visible error alerts fail non-degraded tabs.
|
||||
# - Tab-list parity-checked vs live DOM; fail-closed on missing tenant.
|
||||
# STILL BLOCKS PROMOTION-TO-REQUIRED (do NOT remove continue-on-error —
|
||||
# CTO-owned, RFC internal#219 §1):
|
||||
# - Infra dependency: real staging EC2 per run (12-20 min cold boot);
|
||||
# AWS/Cloudflare/CP availability would become merge-blockers.
|
||||
# - Shared-zone TLS/DNS/ACME propagation flake surface is upstream of
|
||||
# this repo and outside its control.
|
||||
# - Required-gate correctness needs CP_STAGING_ADMIN_API_TOKEN GUARANTEED
|
||||
# present; today's skip-if-absent (core#2225) is right for non-gating
|
||||
# but would skip-green a required check.
|
||||
# - Single hermes/platform_managed workspace; agent-dependent content
|
||||
# (live chat/traces round-trip) not exercised on staging (#2162).
|
||||
# The full checklist lives at the foot of canvas/e2e/staging-tabs.spec.ts.
|
||||
#
|
||||
# Triggers: push to main, PR touching canvas sources + this workflow only
|
||||
# after the PR enters `merge-queue`, manual dispatch, and scheduled cron to
|
||||
|
||||
@@ -337,10 +337,26 @@ export default async function globalSetup(_config: FullConfig): Promise<void> {
|
||||
|
||||
// 7. Hand state off to tests + teardown — overwrite the slug-only
|
||||
// bootstrap state with the full state spec tests need.
|
||||
writeFileSync(
|
||||
stateFile,
|
||||
JSON.stringify({ slug, tenantURL, workspaceId, tenantToken }, null, 2),
|
||||
);
|
||||
//
|
||||
// FAIL-CLOSED handoff: every field the spec reads must be non-empty. If
|
||||
// any is missing here, the spec's env-presence guard would throw with a
|
||||
// generic "did setup run?" message that hides WHICH field was lost. Catch
|
||||
// it at the source — a partial provision must hard-fail setup, never hand
|
||||
// off a half-built state that the spec then has to diagnose (or worse,
|
||||
// skip). This is the loud, fail-closed contract: STAGING was requested,
|
||||
// so an incomplete provision is an error, not a skip.
|
||||
const handoff = { slug, tenantURL, workspaceId, tenantToken };
|
||||
const missingFields = Object.entries(handoff)
|
||||
.filter(([, v]) => !v)
|
||||
.map(([k]) => k);
|
||||
if (missingFields.length > 0) {
|
||||
throw new Error(
|
||||
`[staging-setup] provision incomplete — empty handoff field(s): ` +
|
||||
`${missingFields.join(", ")}. Refusing to hand off a partial state ` +
|
||||
`that would surface downstream as an opaque spec failure.`,
|
||||
);
|
||||
}
|
||||
writeFileSync(stateFile, JSON.stringify(handoff, null, 2));
|
||||
process.env.STAGING_SLUG = slug;
|
||||
process.env.STAGING_TENANT_URL = tenantURL;
|
||||
process.env.STAGING_WORKSPACE_ID = workspaceId;
|
||||
|
||||
+305
-33
@@ -1,7 +1,8 @@
|
||||
/**
|
||||
* Staging canvas E2E — opens each of the 13 workspace-panel tabs against a
|
||||
* fresh staging org provisioned in the global setup. Asserts each tab
|
||||
* renders without throwing and captures a screenshot for visual review.
|
||||
* Staging canvas E2E — opens each workspace-panel tab against a fresh
|
||||
* staging org provisioned in the global setup. Asserts each tab renders
|
||||
* REAL content (not an empty container, not an error state) and captures a
|
||||
* screenshot for visual review.
|
||||
*
|
||||
* Auth model: the tenant platform's AdminAuth middleware accepts a bearer
|
||||
* token OR a WorkOS session cookie. Playwright can't mint a WorkOS
|
||||
@@ -10,17 +11,39 @@
|
||||
* Bearer header via context.setExtraHTTPHeaders(). Every browser
|
||||
* request inherits the header.
|
||||
*
|
||||
* Known SaaS gaps — documented in #1369 and allowed to render errored
|
||||
* content without failing the test (the gate is "no hard crash, no
|
||||
* 'Failed to load' toast"):
|
||||
* PROMOTION-READINESS (see § at bottom of file): this suite is being
|
||||
* hardened toward becoming a HARD merge-gate. It currently runs under
|
||||
* `continue-on-error: true` (RFC internal#219 §1, non-gating) — that is a
|
||||
* deliberate, CTO-owned call and is NOT changed here. The hardening makes
|
||||
* every assertion deterministic so that WHEN promotion happens the gate
|
||||
* does not flap. See the PROMOTION-READINESS block at the foot of this
|
||||
* file for what is now reliable and what still blocks promotion.
|
||||
*
|
||||
* Known SaaS gaps — documented in #1369. These tabs legitimately cannot
|
||||
* load real content in SaaS mode and are allowed an in-panel empty/error
|
||||
* state (NOT a hard crash, NOT an ErrorBoundary):
|
||||
* - Files tab: empty (platform can't docker exec into a remote EC2)
|
||||
* - Terminal tab: WS connect fails
|
||||
* - Peers tab: 401 without workspace-scoped token
|
||||
* These are enumerated in KNOWN_DEGRADED_TABS below and asserted with a
|
||||
* weaker (but still non-trivial) contract: the panel renders and does not
|
||||
* crash the app. Every OTHER tab must render real content.
|
||||
*/
|
||||
|
||||
import { test, expect } from "@playwright/test";
|
||||
import { test, expect, type Page } from "@playwright/test";
|
||||
|
||||
// Tab ids as declared in canvas/src/components/SidePanel.tsx TABS.
|
||||
//
|
||||
// NOTE (drift guard): this list is asserted-complete against the live DOM
|
||||
// below (see "tab list parity" step) so it cannot silently drift out of
|
||||
// sync with SidePanel.tsx TABS the way a hand-maintained constant does.
|
||||
// `display` and `container-config` are intentionally EXCLUDED here:
|
||||
// - `display` is owned by the in-flight take-control e2e (PR #2275 /
|
||||
// staging-display.spec.ts); asserting it here would collide.
|
||||
// - `container-config` only renders when selectedNodeId is set AND is
|
||||
// gated on tier; it is covered by container-config-specific specs.
|
||||
// The parity check accounts for these via EXPECTED_EXTRA_TABS so a NEW
|
||||
// tab appearing in SidePanel still trips the guard.
|
||||
const TAB_IDS = [
|
||||
"chat",
|
||||
"activity",
|
||||
@@ -37,12 +60,131 @@ const TAB_IDS = [
|
||||
"audit",
|
||||
] as const;
|
||||
|
||||
// Tabs present in the DOM that this spec intentionally does not drive.
|
||||
// Keeping this explicit means a genuinely-new tab (not one of these) makes
|
||||
// the parity assertion fail LOUD instead of being silently un-tested.
|
||||
const EXPECTED_EXTRA_TABS = ["display", "container-config"] as const;
|
||||
|
||||
// Tabs that are KNOWN to degrade in SaaS mode (#1369). They get the weaker
|
||||
// "renders + no crash" contract instead of the "real content" contract.
|
||||
// Anything NOT in this set must render real content or the test fails.
|
||||
const KNOWN_DEGRADED_TABS = new Set<string>(["terminal", "files"]);
|
||||
|
||||
const STAGING = process.env.CANVAS_E2E_STAGING === "1";
|
||||
|
||||
test.skip(!STAGING, "CANVAS_E2E_STAGING not set — skipping staging-only tests");
|
||||
// IMPORTANT — fail-closed, not skip-green.
|
||||
//
|
||||
// `test.skip(!STAGING)` is correct ONLY when the operator never asked for a
|
||||
// staging run (CANVAS_E2E_STAGING unset). In that case the workflow's
|
||||
// detect-changes / token-check gates have already decided not to exercise
|
||||
// staging, and skipping is the documented contract.
|
||||
//
|
||||
// But if STAGING *is* requested (CANVAS_E2E_STAGING=1) and global setup did
|
||||
// NOT hand off the tenant state, that is a HARD failure, not a skip — see
|
||||
// the explicit env-presence throw inside the test body. A silent skip there
|
||||
// would let a broken provision ship green, which is exactly the
|
||||
// weak-gate failure this hardening removes (§ No flakes / internal#828).
|
||||
test.skip(!STAGING, "CANVAS_E2E_STAGING not set — staging-only suite, not requested");
|
||||
|
||||
/**
|
||||
* Assert the panel for `tabId` rendered real content.
|
||||
*
|
||||
* Deterministic contract (no fixed waits — every step is condition-based
|
||||
* with Playwright's built-in retry / expect.poll):
|
||||
* 1. The tabpanel container is visible.
|
||||
* 2. The global ErrorBoundary did NOT trip ("Something went wrong").
|
||||
* 3. No visible error alert is shown in the panel.
|
||||
* 4. For non-degraded tabs: the panel settles to non-empty,
|
||||
* non-spinner content (so an empty <div/> or a stuck "Loading…"
|
||||
* spinner FAILS instead of passing as it did before).
|
||||
*/
|
||||
async function assertPanelRendered(page: Page, tabId: string): Promise<void> {
|
||||
const panel = page.locator(`#panel-${tabId}`);
|
||||
|
||||
// (1) Container visible. Built-in retry up to the expect timeout — no
|
||||
// arbitrary waitForTimeout. Mechanism: replaces any reliance on a fixed
|
||||
// settle delay with a real visibility condition.
|
||||
await expect(panel, `panel for ${tabId} never became visible`).toBeVisible({
|
||||
timeout: 10_000,
|
||||
});
|
||||
|
||||
// (2) ErrorBoundary trip = hard crash anywhere in the React subtree.
|
||||
// canvas/src/components/ErrorBoundary.tsx renders "Something went wrong".
|
||||
// The OLD gate only looked for a "Failed to load" toast and would ship
|
||||
// an ErrorBoundary-crashed panel GREEN. Mechanism: assert the crash
|
||||
// surface is absent, retried via expect.poll so a late-mounting crash
|
||||
// banner is still caught.
|
||||
await expect
|
||||
.poll(
|
||||
async () =>
|
||||
page.getByText("Something went wrong", { exact: false }).count(),
|
||||
{
|
||||
message: `tab ${tabId}: ErrorBoundary tripped (Something went wrong)`,
|
||||
timeout: 5_000,
|
||||
},
|
||||
)
|
||||
.toBe(0);
|
||||
|
||||
// (3) No visible error alert inside the panel. Tabs surface load errors
|
||||
// as role="alert" with the real error text (EventsTab/ChannelsTab/
|
||||
// ConfigTab/...). The OLD gate matched ONLY [role=alert]:has-text("Failed
|
||||
// to load") — it missed (a) error messages that don't contain that exact
|
||||
// phrase and (b) error divs that omit role="alert" entirely (e.g.
|
||||
// ActivityTab). We replace it with a broader, but still SaaS-gap-aware,
|
||||
// check: any *visible* alert OR red error banner inside the panel.
|
||||
//
|
||||
// Degraded tabs (#1369) are allowed an error state — for those we only
|
||||
// require no app-level crash (covered by step 2). For every other tab a
|
||||
// visible error alert is a real regression.
|
||||
if (!KNOWN_DEGRADED_TABS.has(tabId)) {
|
||||
const visibleAlerts = panel.locator('[role="alert"]:visible');
|
||||
await expect
|
||||
.poll(async () => visibleAlerts.count(), {
|
||||
message:
|
||||
`tab ${tabId}: a visible error alert is shown in the panel ` +
|
||||
`(was a weak "Failed to load"-only check before)`,
|
||||
timeout: 5_000,
|
||||
})
|
||||
.toBe(0);
|
||||
}
|
||||
|
||||
// (4) Real content. The tabpanel CONTAINER always mounts, so the old
|
||||
// toBeVisible() on the container passed even when the child rendered
|
||||
// nothing. Assert the panel's trimmed innerText is non-empty AND not
|
||||
// stuck on a loading spinner. expect.poll retries until the async
|
||||
// fetch+render settles — replacing the implicit "the network finished
|
||||
// by now" timing assumption with an explicit polled condition.
|
||||
//
|
||||
// Degraded tabs may legitimately be empty (Files in SaaS mode), so they
|
||||
// are exempt from the non-empty requirement; step 2 still guards them
|
||||
// against a hard crash.
|
||||
if (!KNOWN_DEGRADED_TABS.has(tabId)) {
|
||||
await expect
|
||||
.poll(
|
||||
async () => {
|
||||
const text = ((await panel.innerText()) || "").trim();
|
||||
// A panel still showing only a loading spinner has not settled.
|
||||
const stillLoading = /^(loading\b|loading…|loading\.\.\.)/i.test(
|
||||
text,
|
||||
);
|
||||
return text.length > 0 && !stillLoading;
|
||||
},
|
||||
{
|
||||
message:
|
||||
`tab ${tabId}: panel rendered empty or stuck on a loading ` +
|
||||
`spinner — no real content settled (weak "container visible" ` +
|
||||
`gate would have passed this)`,
|
||||
// Generous: real tabs fetch from the tenant over the network.
|
||||
// Polled, so it returns as soon as content appears.
|
||||
timeout: 20_000,
|
||||
},
|
||||
)
|
||||
.toBe(true);
|
||||
}
|
||||
}
|
||||
|
||||
test.describe("staging canvas tabs", () => {
|
||||
test("each workspace-panel tab renders without error", async ({
|
||||
test("each workspace-panel tab renders real content", async ({
|
||||
page,
|
||||
context,
|
||||
}) => {
|
||||
@@ -50,9 +192,16 @@ test.describe("staging canvas tabs", () => {
|
||||
const tenantToken = process.env.STAGING_TENANT_TOKEN;
|
||||
const workspaceId = process.env.STAGING_WORKSPACE_ID;
|
||||
|
||||
// FAIL-CLOSED (not skip): STAGING was requested but global setup did
|
||||
// not export tenant state. A silent skip here would paint a broken
|
||||
// provision GREEN. This is the loud-fail the hardening mandates.
|
||||
if (!tenantURL || !tenantToken || !workspaceId) {
|
||||
throw new Error(
|
||||
"staging-setup.ts did not export STAGING_TENANT_URL / STAGING_TENANT_TOKEN / STAGING_WORKSPACE_ID — did global setup run?",
|
||||
"staging-setup.ts did not export STAGING_TENANT_URL / " +
|
||||
"STAGING_TENANT_TOKEN / STAGING_WORKSPACE_ID. CANVAS_E2E_STAGING=1 " +
|
||||
"was set (staging WAS requested) but global setup produced no " +
|
||||
"tenant — this is a provisioning failure, NOT a reason to skip. " +
|
||||
"Check the [staging-setup] log above for the real error.",
|
||||
);
|
||||
}
|
||||
|
||||
@@ -152,11 +301,19 @@ test.describe("staging canvas tabs", () => {
|
||||
// omit the URL, so we'd otherwise be flying blind. Logged to the
|
||||
// test's stdout (visible in the workflow log under the failed step).
|
||||
page.on("requestfailed", (req) => {
|
||||
console.log(`[e2e/requestfailed] ${req.method()} ${req.url()}: ${req.failure()?.errorText ?? "?"}`);
|
||||
console.log(
|
||||
`[e2e/requestfailed] ${req.method()} ${req.url()}: ${
|
||||
req.failure()?.errorText ?? "?"
|
||||
}`,
|
||||
);
|
||||
});
|
||||
page.on("response", (res) => {
|
||||
if (res.status() >= 400) {
|
||||
console.log(`[e2e/response-${res.status()}] ${res.request().method()} ${res.url()}`);
|
||||
console.log(
|
||||
`[e2e/response-${res.status()}] ${res
|
||||
.request()
|
||||
.method()} ${res.url()}`,
|
||||
);
|
||||
}
|
||||
});
|
||||
|
||||
@@ -173,9 +330,8 @@ test.describe("staging canvas tabs", () => {
|
||||
// hydrated, even with zero workspaces) or the hydration-error
|
||||
// banner — whichever wins first. Previous version of this wait
|
||||
// used `[role="tablist"]`, but that selector only appears AFTER
|
||||
// a workspace node is clicked (which happens below at L100), so
|
||||
// the wait would always time out at 45s before any meaningful
|
||||
// failure surfaced.
|
||||
// a workspace node is clicked, so the wait would always time out
|
||||
// at 45s before any meaningful failure surfaced.
|
||||
await page.waitForSelector(
|
||||
'[aria-label="Molecule AI workspace canvas"], [data-testid="hydration-error"]',
|
||||
{ timeout: 45_000 },
|
||||
@@ -189,10 +345,20 @@ test.describe("staging canvas tabs", () => {
|
||||
"canvas hydration failed — check staging CP + tenant reachability",
|
||||
).toBe(0);
|
||||
|
||||
// The global ErrorBoundary must not have tripped at the app root
|
||||
// either — a crash before the side panel even opens would otherwise
|
||||
// be invisible until a tab assertion happened to notice it.
|
||||
await expect(
|
||||
page.getByText("Something went wrong", { exact: false }),
|
||||
"app-level ErrorBoundary tripped during hydration",
|
||||
).toHaveCount(0);
|
||||
|
||||
// Click the workspace node to open the side panel. Try a data
|
||||
// attribute first, fall back to a generic role-based selector so
|
||||
// the test doesn't break when the node-card markup changes.
|
||||
const byDataAttr = page.locator(`[data-workspace-id="${workspaceId}"]`).first();
|
||||
const byDataAttr = page
|
||||
.locator(`[data-workspace-id="${workspaceId}"]`)
|
||||
.first();
|
||||
if ((await byDataAttr.count()) > 0) {
|
||||
await byDataAttr.click({ timeout: 10_000 });
|
||||
} else {
|
||||
@@ -202,19 +368,56 @@ test.describe("staging canvas tabs", () => {
|
||||
await firstNode.click({ timeout: 10_000 });
|
||||
}
|
||||
|
||||
await page.waitForSelector('[role="tablist"]', { timeout: 15_000 });
|
||||
// The tablist appears once the side panel mounts. Condition-based
|
||||
// wait — no fixed delay.
|
||||
const tablist = page.locator('[role="tablist"]');
|
||||
await expect(
|
||||
tablist,
|
||||
"side panel tablist never appeared after clicking the workspace node",
|
||||
).toBeVisible({ timeout: 15_000 });
|
||||
|
||||
// Tab-list parity guard. The hand-maintained TAB_IDS constant used to
|
||||
// be able to drift silently out of sync with SidePanel.tsx TABS — a
|
||||
// tab could be added to the UI and never get an assertion, shipping
|
||||
// broken-but-untested. Read the actual tab ids from the DOM and assert
|
||||
// every live tab is either driven by this spec (TAB_IDS) or explicitly
|
||||
// excluded (EXPECTED_EXTRA_TABS). A genuinely-new tab fails LOUD.
|
||||
const liveTabIds = (
|
||||
await tablist.locator('[role="tab"][id^="tab-"]').evaluateAll((els) =>
|
||||
els.map((el) => el.id.replace(/^tab-/, "")),
|
||||
)
|
||||
).sort();
|
||||
const accountedFor = new Set<string>([
|
||||
...TAB_IDS,
|
||||
...EXPECTED_EXTRA_TABS,
|
||||
]);
|
||||
const unaccounted = liveTabIds.filter((id) => !accountedFor.has(id));
|
||||
expect(
|
||||
unaccounted,
|
||||
`SidePanel exposes tab(s) this spec neither drives nor excludes: ` +
|
||||
`${unaccounted.join(", ")}. Add them to TAB_IDS (and assert their ` +
|
||||
`content) or to EXPECTED_EXTRA_TABS with a reason.`,
|
||||
).toHaveLength(0);
|
||||
// And the inverse: every TAB_ID we intend to drive must actually exist
|
||||
// in the DOM, so a renamed/removed tab fails here instead of timing out
|
||||
// on a missing #tab-<id> selector with an opaque message.
|
||||
const missing = TAB_IDS.filter((id) => !liveTabIds.includes(id));
|
||||
expect(
|
||||
missing,
|
||||
`TAB_IDS references tab(s) not present in SidePanel: ${missing.join(
|
||||
", ",
|
||||
)} — the spec's tab list has drifted from SidePanel.tsx TABS.`,
|
||||
).toHaveLength(0);
|
||||
|
||||
for (const tabId of TAB_IDS) {
|
||||
await test.step(`tab: ${tabId}`, async () => {
|
||||
const tabButton = page.locator(`#tab-${tabId}`);
|
||||
// The TABS bar is `overflow-x-auto` (SidePanel.tsx:~tabs
|
||||
// wrapper) — tabs after position ~3 are clipped behind the
|
||||
// right-edge fade gradient on smaller viewports. Playwright's
|
||||
// `toBeVisible()` returns false for clipped elements, so a
|
||||
// bare visibility check fails on `skills` and later tabs in
|
||||
// CI. scrollIntoViewIfNeeded brings the button into view
|
||||
// before the visibility check, mirroring what SidePanel's own
|
||||
// keyboard handler does on arrow-key navigation.
|
||||
// The TABS bar is `overflow-x-auto` — tabs past position ~3 are
|
||||
// clipped behind the right-edge fade gradient on smaller
|
||||
// viewports. Playwright's toBeVisible() returns false for clipped
|
||||
// elements, so a bare visibility check fails on later tabs in CI.
|
||||
// scrollIntoViewIfNeeded brings the button into view before the
|
||||
// visibility check.
|
||||
await tabButton.scrollIntoViewIfNeeded({ timeout: 5_000 });
|
||||
await expect(
|
||||
tabButton,
|
||||
@@ -222,18 +425,34 @@ test.describe("staging canvas tabs", () => {
|
||||
).toBeVisible({ timeout: 5_000 });
|
||||
await tabButton.click();
|
||||
|
||||
const panel = page.locator(`#panel-${tabId}`);
|
||||
await expect(panel, `panel for ${tabId} never rendered`).toBeVisible({
|
||||
timeout: 10_000,
|
||||
});
|
||||
// Confirm the click actually activated this tab before asserting
|
||||
// its content — aria-selected flips on the active tab. This closes
|
||||
// a race where a slow click handler left the PREVIOUS tab's panel
|
||||
// mounted and we asserted the wrong panel's content. Built-in
|
||||
// retry, condition-based, no fixed wait.
|
||||
await expect(
|
||||
tabButton,
|
||||
`tab-${tabId} did not become the selected tab after click`,
|
||||
).toHaveAttribute("aria-selected", "true", { timeout: 5_000 });
|
||||
|
||||
// "Failed to load" toast = hard crash. Known SaaS-mode gaps
|
||||
// (Files empty, Terminal disconnected, Peers 401) surface as
|
||||
// in-panel content, not toasts.
|
||||
// Real-content assertion (the core hardening). See
|
||||
// assertPanelRendered: container visible + no ErrorBoundary + no
|
||||
// visible error alert + settled non-empty content for non-degraded
|
||||
// tabs. Replaces the old "panel visible + no Failed-to-load toast"
|
||||
// pair, which shipped empty/errored panels green.
|
||||
await assertPanelRendered(page, tabId);
|
||||
|
||||
// Belt to the braces: the original toast check stays. A global
|
||||
// "Failed to load" toast (role=alert outside the panel) is still a
|
||||
// crash signal worth catching even though the in-panel checks above
|
||||
// now do the heavy lifting.
|
||||
const errorToasts = await page
|
||||
.locator('[role="alert"]:has-text("Failed to load")')
|
||||
.count();
|
||||
expect(errorToasts, `tab ${tabId}: "Failed to load" toast`).toBe(0);
|
||||
expect(
|
||||
errorToasts,
|
||||
`tab ${tabId}: a global "Failed to load" toast is showing`,
|
||||
).toBe(0);
|
||||
|
||||
await page.screenshot({
|
||||
path: `test-results/staging-tab-${tabId}.png`,
|
||||
@@ -267,3 +486,56 @@ test.describe("staging canvas tabs", () => {
|
||||
).toHaveLength(0);
|
||||
});
|
||||
});
|
||||
|
||||
/*
|
||||
* PROMOTION-READINESS — staging canvas E2E → HARD merge-gate
|
||||
* ----------------------------------------------------------
|
||||
* NOW RELIABLE (deterministic; these no longer flap on timing):
|
||||
* - Every wait is condition-based (toBeVisible / toHaveAttribute /
|
||||
* expect.poll). There is NO fixed waitForTimeout / sleep in the spec;
|
||||
* the only setTimeout is the bounded poll-interval inside
|
||||
* staging-setup.ts waitFor(), which has a hard deadline.
|
||||
* - Tabs are asserted on REAL settled content (non-empty, non-spinner),
|
||||
* not just "container is visible" — an empty or stuck-loading panel now
|
||||
* fails instead of shipping green.
|
||||
* - The ErrorBoundary ("Something went wrong") is asserted absent at app
|
||||
* hydration AND per tab — a React subtree crash can no longer pass.
|
||||
* - Visible error alerts inside a panel fail non-degraded tabs (was a
|
||||
* weak [role=alert]:has-text("Failed to load")-only check that missed
|
||||
* both other error phrasings and role-less error divs).
|
||||
* - The driven tab list is parity-checked against the live DOM, so a new
|
||||
* SidePanel tab can't ship un-tested and a removed one fails loud.
|
||||
* - Click→activation is confirmed (aria-selected) before asserting the
|
||||
* panel, removing a wrong-panel race.
|
||||
* - The suite is fail-closed: CANVAS_E2E_STAGING=1 with no tenant state
|
||||
* hard-errors (never skips→green); CANVAS_E2E_STAGING unset cleanly
|
||||
* skips (operator did not request staging).
|
||||
*
|
||||
* STILL BLOCKS PROMOTION-TO-REQUIRED (do NOT flip continue-on-error here —
|
||||
* CTO-owned, RFC internal#219 §1):
|
||||
* - INFRA DEPENDENCY: each run provisions a real staging EC2 tenant
|
||||
* (12-20 min cold boot). Required-gate latency + AWS/Cloudflare/CP
|
||||
* availability become merge-blockers. A staging outage would freeze
|
||||
* main even though the code is fine — unacceptable for a required check
|
||||
* until staging has an SLA or this runs against a warm pre-provisioned
|
||||
* pool.
|
||||
* - SHARED-RESOURCE FLAKE SURFACE: TLS/DNS/ACME propagation on a shared
|
||||
* staging zone (staging-setup TLS_TIMEOUT_MS) is outside this repo's
|
||||
* control. Deterministic here ≠ deterministic upstream.
|
||||
* - SECRET DEPENDENCY: CP_STAGING_ADMIN_API_TOKEN must be present on the
|
||||
* runner. The workflow's skip-if-absent (core#2225) keeps a missing
|
||||
* secret from painting red — correct for non-gating, but a REQUIRED
|
||||
* check must instead guarantee the secret is always present, else it
|
||||
* skip-greens the very thing it is supposed to enforce.
|
||||
* - SINGLE-WORKSPACE COVERAGE: one hermes/platform_managed workspace that
|
||||
* does NOT boot an agent on staging (no CP LLM proxy env, workspace-
|
||||
* server #2162). Tabs render, but agent-dependent content paths (live
|
||||
* chat round-trip, traces from a real run) are not exercised.
|
||||
*
|
||||
* PROMOTION CHECKLIST (when CTO signs off on making this required):
|
||||
* 1. Warm pre-provisioned tenant pool OR a staging SLA bounding boot time.
|
||||
* 2. Guarantee CP_STAGING_ADMIN_API_TOKEN on the gating runner; turn the
|
||||
* skip-if-absent into a hard error for the required path.
|
||||
* 3. Decide whether agent-dependent tabs need a wired LLM proxy on the
|
||||
* staging tenant (covers chat/traces real content) before gating them.
|
||||
*/
|
||||
|
||||
Reference in New Issue
Block a user