ci(platform-go): add critical-path coverage gate + per-file report (#1823)

## Problem

External audit flagged critical security-path files at 0% coverage:
  - workspace-server/handlers/tokens.go            0%  (target 90%+)
  - workspace-server/handlers/workspace_provision  0%  (target 75%+)
  - workspace-server/middleware/wsauth            ~48% (target 90%+)

Tests *exist* for these files (tokens_test.go is 200 lines, workspace_
provision_test.go is 1138 lines) — they just don't exercise the critical
branches where auth/provisioning decisions happen. CI's existing coverage
step measured total coverage (floor 25%) but never checked per-file,
so any single file could drop to 0% and CI stayed green.

## Fix — Layer 1 of #1823 (strictly additive)

1. **Per-file coverage report** — advisory step prints every source file
   with its coverage, sorted worst-first. Reviewers see the gap at a
   glance. Does not fail the build.

2. **Critical-path per-file gate** — if any non-test source file in a
   security-sensitive directory (tokens, workspace_provision, a2a_proxy,
   registry, secrets, wsauth, crypto) has coverage ≤10%, CI fails with
   a specific error message pointing at the file + #1823.

3. **Unchanged: total floor stays at 25%** — ratcheting is a separate PR
   so this one has zero risk of breaking existing coverage. Ratchet plan
   lives in COVERAGE_FLOOR.md (monthly schedule through Oct 2026 to reach
   70% total / 70% critical).

## Why this specifically

"Tell devs to write tests" doesn't fix this — the prompts already
require tests ("Write tests for every handler, every query, every edge
case"), and the engineers mostly do. The gap is mechanical: CI generates
coverage.out and throws it away without checking per-file distribution.

This gate makes "no untested security path merges" a property of the CI,
not a property of QA agents who (as of today's incident) can go phantom-
busy for hours.

## Smoke test

Local awk-logic verification with synthetic coverage.out:
  - tokens.go at 2.5% (critical path, ≤10%)           → correctly FAILS
  - noncritical.go at 0.0% (not in critical list)     → correctly PASSES
  - wsauth_middleware.go at 65% (critical, above 10%) → correctly PASSES
  - crypto/kek.go at 85% (critical, above 10%)        → correctly PASSES

Regex bug caught and fixed: go tool cover -func emits
  file.go:LINE.COL:FUNC  PERCENT
The stripper needed :[0-9]+\..* not :[0-9]+:.*

## Follow-up (not in this PR)

- Layer 2 (issue #1823): per-changed-file delta gate via diff-cover,
  enforcing the prompt rule ">80% on changed files"
- Add these two new steps to branch protection required checks
- Canvas (Next.js) equivalent with vitest --coverage + threshold

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
rabbitblood 2026-04-23 11:12:40 -07:00
parent cfdaefe5bc
commit c4bb325267
2 changed files with 148 additions and 7 deletions

View File

@ -81,15 +81,78 @@ jobs:
continue-on-error: true # Warn but don't block until codebase is clean
- name: Run tests with race detection and coverage
run: go test -race -coverprofile=coverage.out ./...
- name: Check coverage baseline
- name: Per-file coverage report
# Advisory — lists every source file with its coverage so reviewers
# can see at-a-glance where gaps are. Sorted ascending so the worst
# offenders float to the top. Does NOT fail the build; the hard
# gate is the threshold check below. (#1823)
run: |
COVERAGE=$(go tool cover -func=coverage.out | grep total | awk '{print $3}' | sed 's/%//')
echo "Total coverage: ${COVERAGE}%"
THRESHOLD=25
awk "BEGIN{if ($COVERAGE < $THRESHOLD) exit 1}" || {
echo "::error::Coverage ${COVERAGE}% is below the ${THRESHOLD}% threshold"
echo "=== Per-file coverage (worst first) ==="
go tool cover -func=coverage.out \
| grep -v '^total:' \
| awk '{file=$1; sub(/:[0-9]+\..*/, "", file); pct=$NF; gsub(/%/,"",pct); s[file]+=pct; c[file]++}
END {for (f in s) printf "%6.1f%% %s\n", s[f]/c[f], f}' \
| sort -n
- name: Check coverage thresholds
# Enforces two gates from #1823 Layer 1:
# 1. Total floor (unchanged at 25% this PR — ratchet plan in
# COVERAGE_FLOOR.md). Keeping it where it was keeps this PR
# strictly additive — the NEW protection is gate 2.
# 2. Per-file zero-floor — any .go file (non-test) in a
# security-critical path with coverage ≤10% fails the build.
# Catches the exact case that triggered #1823 (tokens.go at 0%).
run: |
set -e
TOTAL_FLOOR=25
# Files/paths that cannot drop to 0% coverage. Add here carefully;
# this is the "protected paths" list for security-sensitive code.
CRITICAL_PATHS=(
"internal/handlers/tokens"
"internal/handlers/workspace_provision"
"internal/handlers/a2a_proxy"
"internal/handlers/registry"
"internal/handlers/secrets"
"internal/middleware/wsauth"
"internal/crypto"
)
TOTAL=$(go tool cover -func=coverage.out | grep '^total:' | awk '{print $3}' | sed 's/%//')
echo "Total coverage: ${TOTAL}%"
if awk "BEGIN{exit !($TOTAL < $TOTAL_FLOOR)}"; then
echo "::error::Total coverage ${TOTAL}% is below the ${TOTAL_FLOOR}% floor. See COVERAGE_FLOOR.md for ratchet plan."
exit 1
}
fi
# Gate 3: critical files must not be 0%
FAILED=0
go tool cover -func=coverage.out \
| grep -v '^total:' \
| awk '{file=$1; sub(/:[0-9]+\..*/, "", file); pct=$NF; gsub(/%/,"",pct); s[file]+=pct; c[file]++}
END {for (f in s) printf "%s %.1f\n", f, s[f]/c[f]}' \
> /tmp/perfile.txt
for path in "${CRITICAL_PATHS[@]}"; do
while read -r file pct; do
if [[ "$file" == *"$path"* ]] && [[ "$file" != *_test.go ]]; then
if awk "BEGIN{exit !($pct < 10)}"; then
echo "::error file=workspace-server/$file::Critical file at ${pct}% coverage — must be >=10% (target 80%). See #1823."
FAILED=1
fi
fi
done < /tmp/perfile.txt
done
if [ "$FAILED" -eq 1 ]; then
echo ""
echo "One or more security-critical files have ≤10% test coverage."
echo "These paths handle auth, tokens, secrets, or workspace provisioning —"
echo "a 0% file here is the exact gap that let CWE-22, CWE-78, KI-005 slip"
echo "through in past incidents. Add tests or document an exception in"
echo "COVERAGE_FLOOR.md with a linked issue and 14-day expiry."
exit 1
fi
canvas-build:
name: Canvas (Next.js)

78
COVERAGE_FLOOR.md Normal file
View File

@ -0,0 +1,78 @@
# Coverage Floor
CI enforces three coverage gates on `workspace-server` (Go). All defined in
`.github/workflows/ci.yml``platform-build` job.
## Current floors (2026-04-23)
| Gate | Threshold | What fails |
|---|---|---|
| **Total floor** | `25%` | `go tool cover -func` reports total below floor |
| **Critical-path per-file floor** | `10%` | Any non-test source file in a security-critical path with coverage ≤10% |
| **Per-file report** | advisory | Printed in CI log, sorted worst-first, does not fail |
Total floor starts at 25% (unchanged from pre-#1823 to keep this PR strictly
additive). The new protection is the critical-path per-file floor, which
directly closes the gap that prompted the issue. Ratchet plan below begins
the month after to let the team first observe the gate in action.
## Security-critical paths (Gate 2)
Changes to these paths have historically introduced security issues (CWE-22,
CWE-78, KI-005, SSRF) or billing/auth risk. Coverage must not drop to zero.
- `internal/handlers/tokens*`
- `internal/handlers/workspace_provision*`
- `internal/handlers/a2a_proxy*`
- `internal/handlers/registry*`
- `internal/handlers/secrets*`
- `internal/middleware/wsauth*`
- `internal/crypto*`
## Ratchet plan
Floor ratchets upward on a fixed cadence. Any ratchet is a PR — reviewable,
reversible, and creates history. The table below is the intended schedule.
| Date | Total floor | Critical-path floor | Notes |
|---|---|---|---|
| 2026-04-23 | 25% | 10% | Initial gate (this file). |
| 2026-05-23 | 30% | 20% | First ratchet |
| 2026-06-23 | 40% | 30% | |
| 2026-07-23 | 50% | 40% | |
| 2026-08-23 | 55% | 50% | |
| 2026-09-23 | 60% | 60% | |
| 2026-10-23 | 70% | 70% | Target steady-state |
The target end-state matches the per-role QA prompts which specify
"coverage >80% on changed files". CI enforces the floor; reviewers still
enforce the per-PR bar.
## Exceptions
If a critical-path file genuinely cannot have coverage above the floor (e.g.
thin wrapper around a third-party SDK with no branches to test), add an entry
here with:
1. **File**: `internal/handlers/example.go`
2. **Reason**: Why coverage can't hit the floor
3. **Tracking issue**: GitHub issue for the real fix
4. **Expiry**: 14 days from entry date; after expiry either coverage is fixed
or the issue is closed as "accepted technical debt"
### Active exceptions
*(none — add here if you need to land code that legitimately can't clear the floor)*
## Why this gate exists
Issue #1823: an external audit found critical files at 0% coverage despite
test files existing with hundreds of lines. The existing CI step measured
coverage but didn't enforce a meaningful threshold. Any file could go from
80% → 0% and CI stayed green, because the single gate (total ≥25%) ignored
per-file distribution.
This gate makes "no untested critical paths merged" a mechanical property of
the CI, not a behavioural property of QA agents or individual reviewers —
which is the only way to make it survive fleet outages, agent rotations, or
QA process changes.