docs: testing strategy + PR hygiene + backend parity matrix + boot-event postmortem (#1824 )

Bundles the documentation and lightweight tooling landed during the
2026-04-23 ops/triage session. Pure additions — no behavior changes.

## Added

### docs/architecture/backends.md
Parity matrix for Docker vs EC2 (SaaS) workspace backends. 18 features
tabulated with current status; 6 ranked drift risks; enforcement
hooks (parity-lint + contract tests). Living document — owners are
workspace-server + controlplane teams.

### docs/engineering/testing-strategy.md
Tiered test-coverage floors instead of a blanket 100% target. Seven
tiers by code class (auth/crypto → generated DTOs). Per-package
current-state snapshot + targets. Tracks the 3 biggest coverage gaps
(tokens.go 0%, workspace_provision.go 0%, wsauth ~48%) against their
tier-1/2 floors.

### docs/engineering/pr-hygiene.md
Captures the patterns that keep diffs reviewable. Motivated by the
2026-04-23 backlog audit where 8 of 23 open PRs had 70-380-file bloat
from stale branch drift. Covers: small-PR sizing, rebase-not-merge,
cherry-pick-onto-fresh-base for recovery, targeting staging first,
describing why-not-what.

### docs/engineering/postmortem-2026-04-23-boot-event-401.md
Postmortem for the /cp/tenants/boot-event 401 race. Root cause (DB
INSERT ordered AFTER readiness check), detection path (E2E + manual
log inspection), lessons (write-before-read pattern, integration
tests needed, E2E alerting gap, invariants-as-comments).

### tools/check-template-parity.sh
CI lint for template repos — diffs the `${VAR:+VAR=${VAR}}` provider-
key forwarders between install.sh (bare-host / EC2 path) and start.sh
(Docker path). Catches the #5 drift risk from backends.md before it
ships.

### workspace-server/internal/provisioner/backend_contract_test.go
Shared behavioral contract scaffold for Provisioner + CPProvisioner.
Compile-time assertions catch method-signature drift today; scenario-
level runs are t.Skip'd pending backend nil-hardening (drift risk #6,
see backends.md).

## Updated

### README.md
Links the new engineering docs + backends parity matrix into the
Documentation Map so agents and humans can actually find them.

## Related issues

- #1814 — unblock workspace_provision_test.go (broadcaster interface)
- #1813 — nil-client panic hardening (drift risk #6)
- #1815 — Canvas vitest coverage instrumentation
- #1816 — tokens.go 0% → 85%
- #1817 — 5 sqlmock column-drift failures
- #1818 — Python pytest-cov setup
- #1819 — wsauth middleware coverage gap
- #1821 — tiered coverage policy (meta)
- #1822 — backend parity drift tracker

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>

2026-04-23 19:59:38 +00:00

6.4 KiB

Raw Blame History

Testing Strategy

Status: Policy. Update when tier definitions or thresholds change. Audience: Everyone writing or reviewing code in this repo. Cross-refs: backends.md, pr-hygiene.md, postmortem-2026-04-23-boot-event-401.md

The short version

Don't chase 100% coverage. The last 15-20% costs as much as the first 80% and mostly adds brittle tests of trivial getters, error branches that can't fire, and stdlib wrappers.
Different code classes have different floors. Auth at 80% is scarier than a DTO at 50%. Match the test investment to the risk.
Tests should pay rent. A test that runs lines but asserts nothing meaningful isn't catching bugs — it's just dragging refactors down.

Tiered coverage floors

Every Go package, every TypeScript module, every Python module fits one of these tiers. The tier determines the minimum acceptable coverage — and the review standard.

Tier	Examples	Line floor	Branch floor	Review standard
1. Auth / secrets / crypto	`tokens`, `session_auth`, `wsauth_middleware`, `crypto/envelope`, `cp_tenant_auth`	90%	85%	Every branch tested. Adversarial scenarios (cross-tenant, expired token, null origin, malformed header). Timing considered.
2. Handlers with side effects	`workspace_provision`, `workspace_crud`, `container_files`, `terminal`, `registry`	75%	70%	Happy + main error paths. DB mocks. Ownership / tenant-isolation checks.
3. State machines + workers	`scheduler`, `provisioner`, `healthsweep`, `orphan-sweeper`, `boot_ready`	75%	70%	Every state transition tested, plus the transitions that shouldn't fire.
4. Config / business logic	`budget`, `orgtoken` (validation), `templates`, `derive-provider`, `redaction`	70%	65%	Standard unit-test territory. Table-driven preferred.
5. Plain DTOs / generated	`models/*`, proto-generated Go, TypeScript interfaces	none	none	Writing tests here is theatre. Don't.
6. CLI glue / cmd/*	`cmd/server`, `cmd/molecli`	smoke only	—	Integration tests / E2E cover these. One startup-smoke test per binary.
7. Third-party wrappers	`awsapi`, `cloudflareapi`, `stripeapi`, `neonapi`	integration	—	Unit tests mock vendor shape, not behavior. Real behavior covered by staging integration.

Why a blanket percentage is wrong

A models/ package at 90% means you wrote tests for func (w Workspace) ID() string { return w.id }. No bugs caught, but coverage number is green.
A tokens package at 75% means some rejection branch isn't covered. Maybe the exact branch that lets a revoked token still authenticate.
Blanket targets make the first case look equivalent to the second. They aren't.

Current state (as of 2026-04-23)

Run go test ./... -cover in each repo for up-to-date numbers. Snapshot:

workspace-server (Go)

Package	Actual	Tier	Target	Gap
`internal/handlers/tokens.go`	0%	1	90%	90
`internal/handlers/workspace_provision.go`	0%	2	75%	75
`internal/middleware/wsauth_middleware.go`	~48%	1	90%	42
`internal/provisioner`	45%	3	75%	30
`internal/scheduler`	49%	3	75%	26
`internal/channels`	40%	4	70%	30
`internal/orgtoken`	88%	4	70%	—
`internal/crypto`	91%	1	90%	—
`internal/supervised`	93%	3	75%	—
`internal/plugins`	94%	4	70%	—
`internal/envx`	100%	5	none	—

molecule-controlplane (Go)

Package	Actual	Tier	Target	Gap
`internal/awsapi`	18%	7	integration	—
`internal/provisioner`	48%	3	75%	27
`internal/handlers`	60%	2	75%	15
`internal/billing`	60%	4	70%	10
`internal/crypto`	68-80%	1	90%	10-22
`internal/auth`	96%	1	90%	—
`internal/middleware`	97%	1	90%	—
`internal/reserved`	100%	5	none	—
`internal/httpx`	100%	4	70%	—

canvas (TypeScript)

No coverage instrumentation today. 900 tests / 58 files pass, but coverage isn't measured. See issue #1815 for the fix: set a 70% line floor in vitest.config.ts and gate CI on it.

workspace (Python)

No pytest/coverage config. See issue #1818: set up pytest-cov with --cov-fail-under=75 (ratchet from current baseline over 2-3 weeks).

Writing a good test

A good test:

Asserts a specific outcome, not that a function runs without error.
Covers the exact branch that bugs would live in — cross-tenant access, revoked-but-cached token, race on state transition.
Uses table-driven patterns when the code is a dispatch with N cases. One test row per case.
Mocks at system boundaries (DB, HTTP, time), not at internal package boundaries.
Survives refactors — tests behavior, not internal state.

A bad test:

Tests a getter that just returns a field.
Mocks the function under test itself.
Relies on time.Sleep or clock timing to assert order.
Asserts nil == nil to boost coverage.

Enforcement

CI gates

Go: go test ./... -cover + a pre-commit script that compares coverage to .coverage-baseline and fails on drops > 2 points in a tier-1 package.
TypeScript: vitest --coverage with thresholds in vitest.config.ts. Fails CI if below.
Python: pytest --cov-fail-under=75 in the Python CI job.

Review expectations

Any PR touching a tier-1 package that lowers its coverage needs an explicit reviewer sign-off and justification.
New code should arrive at or above its tier's floor.
Untested files in tier-1 or tier-2 should be flagged in review, not waved through.

Issue #1821 — policy tracking issue
Issue #1815 — Canvas coverage instrumentation
Issue #1818 — Python pytest-cov
Issue #1814 — workspace_provision_test.go unblock
Issue #1816 — tokens.go coverage
Issue #1819 — wsauth_middleware coverage

6.4 KiB Raw Blame History