molecule-core/docs/engineering/testing-strategy.md
claude-ceo-assistant 3501e6bfd7
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 13s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 12s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 15s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 27s
CI / Detect changes (pull_request) Successful in 20s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 51s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 51s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 39s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 51s
Harness Replays / detect-changes (pull_request) Successful in 53s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 48s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 31s
Harness Replays / Harness Replays (pull_request) Failing after 1m18s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m19s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m14s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6m1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m47s
CI / Python Lint & Test (pull_request) Successful in 8m16s
CI / Canvas (Next.js) (pull_request) Failing after 9m36s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 12m18s
fix(post-suspension): vanity import paths go.moleculesai.app/core/{platform,tests/harness/cp-stub} (closes molecule-ai/internal#71 phase 2)
Migrates the two Go modules under molecule-core off the dead
github.com/Molecule-AI/molecule-monorepo/... identity onto the vanity
host go.moleculesai.app. Also fixes the historical naming
inconsistency where the Gitea repo is molecule-core but the Go module
path said molecule-monorepo.

Module changes:
- workspace-server/go.mod:
    github.com/Molecule-AI/molecule-monorepo/platform
    -> go.moleculesai.app/core/platform
- tests/harness/cp-stub/go.mod:
    github.com/Molecule-AI/molecule-monorepo/tests/harness/cp-stub
    -> go.moleculesai.app/core/tests/harness/cp-stub

Surfaces touched
- 174 *.go files (374 import lines) — every import under
  workspace-server/ + tests/harness/cp-stub/
- 2 Dockerfiles (workspace-server/Dockerfile + Dockerfile.tenant) —
  -ldflags strings updated in lockstep with the module rename so
  buildinfo.GitSHA injection still resolves correctly
- README + docs + scripts + comment URLs to git.moleculesai.app form
- NEW workspace-server/internal/lint/import_path_lint_test.go —
  structural lint gate rejecting future github.com/Molecule-AI/ or
  Molecule-AI/molecule-monorepo references. Identical template to the
  other migration PRs (plugin-gh-identity#3, molecule-cli#2,
  molecule-controlplane#32).

Cross-repo dep allowlist (documented in lint gate)
workspace-server requires molecule-ai-plugin-gh-identity, whose own
vanity migration is PR molecule-ai-plugin-gh-identity#3. Until that PR
merges + a tag is cut at go.moleculesai.app/plugin/gh-identity, the
two locations referencing the legacy github.com path
(workspace-server/go.mod require, cmd/server/main.go import) remain
allowlisted. Follow-up PR drops the allowlist + updates both refs in
one shot once gh-identity is fully migrated.

Test plan
- go build ./... clean for both modules
- go test ./... green except two pre-existing failures
  (TestStartSweeper_RecordsMetricsOnSuccess flaky-on-suite,
  TestLocalResolver_BubblesUpCopyFailure relies on read-only fs perms
  but runs as root on operator host) — both reproduce identically on
  baseline main pre-migration; NOT regressions of this PR
- Mutation-tested: lint gate fails on canaries in .go + .md;
  allowlist correctly suppresses cross-repo dep references in go.mod
  while still flagging unrelated additions

Open dependency
- go.moleculesai.app responder must be deployed before fresh-clone
  external builds resolve the vanity path. Existing CI / Docker builds
  ride pinned go.sum + self-referential module path + responder is
  not on critical path for those.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 22:37:42 +00:00

6.4 KiB

Testing Strategy

Status: Policy. Update when tier definitions or thresholds change. Audience: Everyone writing or reviewing code in this repo. Cross-refs: backends.md, pr-hygiene.md, postmortem-2026-04-23-boot-event-401.md

The short version

  • Don't chase 100% coverage. The last 15-20% costs as much as the first 80% and mostly adds brittle tests of trivial getters, error branches that can't fire, and stdlib wrappers.
  • Different code classes have different floors. Auth at 80% is scarier than a DTO at 50%. Match the test investment to the risk.
  • Tests should pay rent. A test that runs lines but asserts nothing meaningful isn't catching bugs — it's just dragging refactors down.

Tiered coverage floors

Every Go package, every TypeScript module, every Python module fits one of these tiers. The tier determines the minimum acceptable coverage — and the review standard.

Tier Examples Line floor Branch floor Review standard
1. Auth / secrets / crypto tokens, session_auth, wsauth_middleware, crypto/envelope, cp_tenant_auth 90% 85% Every branch tested. Adversarial scenarios (cross-tenant, expired token, null origin, malformed header). Timing considered.
2. Handlers with side effects workspace_provision, workspace_crud, container_files, terminal, registry 75% 70% Happy + main error paths. DB mocks. Ownership / tenant-isolation checks.
3. State machines + workers scheduler, provisioner, healthsweep, orphan-sweeper, boot_ready 75% 70% Every state transition tested, plus the transitions that shouldn't fire.
4. Config / business logic budget, orgtoken (validation), templates, derive-provider, redaction 70% 65% Standard unit-test territory. Table-driven preferred.
5. Plain DTOs / generated models/*, proto-generated Go, TypeScript interfaces none none Writing tests here is theatre. Don't.
6. CLI glue / cmd/* cmd/server, cmd/molecli smoke only Integration tests / E2E cover these. One startup-smoke test per binary.
7. Third-party wrappers awsapi, cloudflareapi, stripeapi, neonapi integration Unit tests mock vendor shape, not behavior. Real behavior covered by staging integration.

Why a blanket percentage is wrong

  • A models/ package at 90% means you wrote tests for func (w Workspace) ID() string { return w.id }. No bugs caught, but coverage number is green.
  • A tokens package at 75% means some rejection branch isn't covered. Maybe the exact branch that lets a revoked token still authenticate.
  • Blanket targets make the first case look equivalent to the second. They aren't.

Current state (as of 2026-04-23)

Run go test ./... -cover in each repo for up-to-date numbers. Snapshot:

workspace-server (Go)

Package Actual Tier Target Gap
internal/handlers/tokens.go 0% 1 90% 90
internal/handlers/workspace_provision.go 0% 2 75% 75
internal/middleware/wsauth_middleware.go ~48% 1 90% 42
internal/provisioner 45% 3 75% 30
internal/scheduler 49% 3 75% 26
internal/channels 40% 4 70% 30
internal/orgtoken 88% 4 70%
internal/crypto 91% 1 90%
internal/supervised 93% 3 75%
internal/plugins 94% 4 70%
internal/envx 100% 5 none

molecule-controlplane (Go)

Package Actual Tier Target Gap
internal/awsapi 18% 7 integration
internal/provisioner 48% 3 75% 27
internal/handlers 60% 2 75% 15
internal/billing 60% 4 70% 10
internal/crypto 68-80% 1 90% 10-22
internal/auth 96% 1 90%
internal/middleware 97% 1 90%
internal/reserved 100% 5 none
internal/httpx 100% 4 70%

canvas (TypeScript)

No coverage instrumentation today. 900 tests / 58 files pass, but coverage isn't measured. See issue #1815 for the fix: set a 70% line floor in vitest.config.ts and gate CI on it.

workspace (Python)

No pytest/coverage config. See issue #1818: set up pytest-cov with --cov-fail-under=75 (ratchet from current baseline over 2-3 weeks).

Writing a good test

A good test:

  • Asserts a specific outcome, not that a function runs without error.
  • Covers the exact branch that bugs would live in — cross-tenant access, revoked-but-cached token, race on state transition.
  • Uses table-driven patterns when the code is a dispatch with N cases. One test row per case.
  • Mocks at system boundaries (DB, HTTP, time), not at internal package boundaries.
  • Survives refactors — tests behavior, not internal state.

A bad test:

  • Tests a getter that just returns a field.
  • Mocks the function under test itself.
  • Relies on time.Sleep or clock timing to assert order.
  • Asserts nil == nil to boost coverage.

Enforcement

CI gates

  • Go: go test ./... -cover + a pre-commit script that compares coverage to .coverage-baseline and fails on drops > 2 points in a tier-1 package.
  • TypeScript: vitest --coverage with thresholds in vitest.config.ts. Fails CI if below.
  • Python: pytest --cov-fail-under=75 in the Python CI job.

Review expectations

  • Any PR touching a tier-1 package that lowers its coverage needs an explicit reviewer sign-off and justification.
  • New code should arrive at or above its tier's floor.
  • Untested files in tier-1 or tier-2 should be flagged in review, not waved through.