molecule-core/docs/engineering/testing-strategy.md
Hongming Wang a56b765b2d
docs: testing strategy + PR hygiene + backend parity matrix + boot-event postmortem (#1824)
Bundles the documentation and lightweight tooling landed during the
2026-04-23 ops/triage session. Pure additions — no behavior changes.

## Added

### docs/architecture/backends.md
Parity matrix for Docker vs EC2 (SaaS) workspace backends. 18 features
tabulated with current status; 6 ranked drift risks; enforcement
hooks (parity-lint + contract tests). Living document — owners are
workspace-server + controlplane teams.

### docs/engineering/testing-strategy.md
Tiered test-coverage floors instead of a blanket 100% target. Seven
tiers by code class (auth/crypto → generated DTOs). Per-package
current-state snapshot + targets. Tracks the 3 biggest coverage gaps
(tokens.go 0%, workspace_provision.go 0%, wsauth ~48%) against their
tier-1/2 floors.

### docs/engineering/pr-hygiene.md
Captures the patterns that keep diffs reviewable. Motivated by the
2026-04-23 backlog audit where 8 of 23 open PRs had 70-380-file bloat
from stale branch drift. Covers: small-PR sizing, rebase-not-merge,
cherry-pick-onto-fresh-base for recovery, targeting staging first,
describing why-not-what.

### docs/engineering/postmortem-2026-04-23-boot-event-401.md
Postmortem for the /cp/tenants/boot-event 401 race. Root cause (DB
INSERT ordered AFTER readiness check), detection path (E2E + manual
log inspection), lessons (write-before-read pattern, integration
tests needed, E2E alerting gap, invariants-as-comments).

### tools/check-template-parity.sh
CI lint for template repos — diffs the `${VAR:+VAR=${VAR}}` provider-
key forwarders between install.sh (bare-host / EC2 path) and start.sh
(Docker path). Catches the #5 drift risk from backends.md before it
ships.

### workspace-server/internal/provisioner/backend_contract_test.go
Shared behavioral contract scaffold for Provisioner + CPProvisioner.
Compile-time assertions catch method-signature drift today; scenario-
level runs are t.Skip'd pending backend nil-hardening (drift risk #6,
see backends.md).

## Updated

### README.md
Links the new engineering docs + backends parity matrix into the
Documentation Map so agents and humans can actually find them.

## Related issues

- #1814 — unblock workspace_provision_test.go (broadcaster interface)
- #1813 — nil-client panic hardening (drift risk #6)
- #1815 — Canvas vitest coverage instrumentation
- #1816 — tokens.go 0% → 85%
- #1817 — 5 sqlmock column-drift failures
- #1818 — Python pytest-cov setup
- #1819 — wsauth middleware coverage gap
- #1821 — tiered coverage policy (meta)
- #1822 — backend parity drift tracker

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
2026-04-23 19:59:38 +00:00

112 lines
6.4 KiB
Markdown

# Testing Strategy
**Status:** Policy. Update when tier definitions or thresholds change.
**Audience:** Everyone writing or reviewing code in this repo.
**Cross-refs:** [backends.md](../architecture/backends.md), [pr-hygiene.md](./pr-hygiene.md), [postmortem-2026-04-23-boot-event-401.md](./postmortem-2026-04-23-boot-event-401.md)
## The short version
- **Don't chase 100% coverage.** The last 15-20% costs as much as the first 80% and mostly adds brittle tests of trivial getters, error branches that can't fire, and stdlib wrappers.
- **Different code classes have different floors.** Auth at 80% is scarier than a DTO at 50%. Match the test investment to the risk.
- **Tests should pay rent.** A test that runs lines but asserts nothing meaningful isn't catching bugs — it's just dragging refactors down.
## Tiered coverage floors
Every Go package, every TypeScript module, every Python module fits one of these tiers. The tier determines the minimum acceptable coverage — and the review standard.
| Tier | Examples | Line floor | Branch floor | Review standard |
|---|---|---|---|---|
| **1. Auth / secrets / crypto** | `tokens`, `session_auth`, `wsauth_middleware`, `crypto/envelope`, `cp_tenant_auth` | **90%** | **85%** | Every branch tested. Adversarial scenarios (cross-tenant, expired token, null origin, malformed header). Timing considered. |
| **2. Handlers with side effects** | `workspace_provision`, `workspace_crud`, `container_files`, `terminal`, `registry` | **75%** | 70% | Happy + main error paths. DB mocks. Ownership / tenant-isolation checks. |
| **3. State machines + workers** | `scheduler`, `provisioner`, `healthsweep`, `orphan-sweeper`, `boot_ready` | **75%** | 70% | Every state transition tested, plus the transitions that *shouldn't* fire. |
| **4. Config / business logic** | `budget`, `orgtoken` (validation), `templates`, `derive-provider`, `redaction` | **70%** | 65% | Standard unit-test territory. Table-driven preferred. |
| **5. Plain DTOs / generated** | `models/*`, proto-generated Go, TypeScript interfaces | none | none | Writing tests here is theatre. Don't. |
| **6. CLI glue / cmd/*** | `cmd/server`, `cmd/molecli` | smoke only | — | Integration tests / E2E cover these. One startup-smoke test per binary. |
| **7. Third-party wrappers** | `awsapi`, `cloudflareapi`, `stripeapi`, `neonapi` | integration | — | Unit tests mock vendor shape, not behavior. Real behavior covered by staging integration. |
### Why a blanket percentage is wrong
- A `models/` package at 90% means you wrote tests for `func (w Workspace) ID() string { return w.id }`. No bugs caught, but coverage number is green.
- A `tokens` package at 75% means some rejection branch isn't covered. Maybe the *exact* branch that lets a revoked token still authenticate.
- Blanket targets make the first case look equivalent to the second. They aren't.
## Current state (as of 2026-04-23)
Run `go test ./... -cover` in each repo for up-to-date numbers. Snapshot:
### workspace-server (Go)
| Package | Actual | Tier | Target | Gap |
|---|---:|---|---:|---:|
| `internal/handlers/tokens.go` | **0%** | 1 | 90% | 90 |
| `internal/handlers/workspace_provision.go` | **0%** | 2 | 75% | 75 |
| `internal/middleware/wsauth_middleware.go` | ~48% | 1 | 90% | 42 |
| `internal/provisioner` | 45% | 3 | 75% | 30 |
| `internal/scheduler` | 49% | 3 | 75% | 26 |
| `internal/channels` | 40% | 4 | 70% | 30 |
| `internal/orgtoken` | 88% | 4 | 70% | — |
| `internal/crypto` | 91% | 1 | 90% | — |
| `internal/supervised` | 93% | 3 | 75% | — |
| `internal/plugins` | 94% | 4 | 70% | — |
| `internal/envx` | 100% | 5 | none | — |
### molecule-controlplane (Go)
| Package | Actual | Tier | Target | Gap |
|---|---:|---|---:|---:|
| `internal/awsapi` | 18% | 7 | integration | — |
| `internal/provisioner` | 48% | 3 | 75% | 27 |
| `internal/handlers` | 60% | 2 | 75% | 15 |
| `internal/billing` | 60% | 4 | 70% | 10 |
| `internal/crypto` | 68-80% | 1 | 90% | 10-22 |
| `internal/auth` | 96% | 1 | 90% | — |
| `internal/middleware` | 97% | 1 | 90% | — |
| `internal/reserved` | 100% | 5 | none | — |
| `internal/httpx` | 100% | 4 | 70% | — |
### canvas (TypeScript)
**No coverage instrumentation today.** 900 tests / 58 files pass, but coverage isn't measured. See issue #1815 for the fix: set a 70% line floor in `vitest.config.ts` and gate CI on it.
### workspace (Python)
**No pytest/coverage config.** See issue #1818: set up `pytest-cov` with `--cov-fail-under=75` (ratchet from current baseline over 2-3 weeks).
## Writing a good test
A good test:
- **Asserts a specific outcome**, not that a function runs without error.
- **Covers the exact branch that bugs would live in** — cross-tenant access, revoked-but-cached token, race on state transition.
- **Uses table-driven patterns** when the code is a dispatch with N cases. One test row per case.
- **Mocks at system boundaries** (DB, HTTP, time), not at internal package boundaries.
- **Survives refactors** — tests behavior, not internal state.
A bad test:
- Tests a getter that just returns a field.
- Mocks the function under test itself.
- Relies on `time.Sleep` or clock timing to assert order.
- Asserts `nil == nil` to boost coverage.
## Enforcement
### CI gates
- **Go**: `go test ./... -cover` + a pre-commit script that compares coverage to `.coverage-baseline` and fails on drops > 2 points in a tier-1 package.
- **TypeScript**: `vitest --coverage` with thresholds in `vitest.config.ts`. Fails CI if below.
- **Python**: `pytest --cov-fail-under=75` in the Python CI job.
### Review expectations
- Any PR touching a tier-1 package that lowers its coverage needs an explicit reviewer sign-off and justification.
- New code should arrive at or above its tier's floor.
- Untested files in tier-1 or tier-2 should be flagged in review, not waved through.
## Related
- [Issue #1821](https://github.com/Molecule-AI/molecule-core/issues/1821) — policy tracking issue
- [Issue #1815](https://github.com/Molecule-AI/molecule-core/issues/1815) — Canvas coverage instrumentation
- [Issue #1818](https://github.com/Molecule-AI/molecule-core/issues/1818) — Python pytest-cov
- [Issue #1814](https://github.com/Molecule-AI/molecule-core/issues/1814) — workspace_provision_test.go unblock
- [Issue #1816](https://github.com/Molecule-AI/molecule-core/issues/1816) — tokens.go coverage
- [Issue #1819](https://github.com/Molecule-AI/molecule-core/issues/1819) — wsauth_middleware coverage