test(provisioner): fast local-Docker parity test for the token-injection ownership bug class #1332

Open
core-be wants to merge 4 commits from test/local-provisioner-token-ownership-parity into main

4 Commits

Author SHA1 Message Date
c9175c071c ci(provisioner-parity): enforce the fast local prod-mimic parity test as a fail-closed merge gate
Some checks failed
CI / Shellcheck (E2E scripts) (pull_request) Successful in 43s
E2E API Smoke Test / detect-changes (pull_request) Successful in 42s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 35s
Harness Replays / detect-changes (pull_request) Successful in 37s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 20s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 1m20s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 2m8s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 3m6s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 2m18s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m23s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s
qa-review / approved (pull_request) Failing after 29s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m5s
security-review / approved (pull_request) Failing after 38s
CI / Python Lint & Test (pull_request) Failing after 8m41s
CI / all-required (pull_request) Failing after 8m33s
CI / Canvas (Next.js) (pull_request) Successful in 22m56s
CI / Provisioner Parity (pull_request) Has been cancelled
CI / Platform (Go) (pull_request) Successful in 24m59s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3m6s
Harness Replays / Harness Replays (pull_request) Successful in 15s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 36s
E2E Chat / E2E Chat (pull_request) Failing after 10m31s
gate-check-v3 / gate-check (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 3s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Failing after 50s
sop-checklist / all-items-acked (pull_request) acked: 7/7
sop-checklist / na-declarations (pull_request) N/A: (none)
The token-injection/ownership bug class — platform delivers
/configs/.auth_token root:root AFTER the entrypoint chown, so the
uid-1000 agent's save_token O_WRONLY|O_TRUNC is denied -> list_peers /
heartbeat 401 forever — shipped to the fleet (Hermes #1877/#418) and
again on template-hermes #162 precisely because nothing ENFORCED the
local check. The dev-SOP only referenced feedback_mandatory_local_e2e_
before_ship as prose; prose does not stop a PR.

This wires the //go:build local provisioner-parity test (added in this
PR) into CI as a real gate:

- new provisioner-parity job runs `go test -tags local -run
  TestTokenOwnership` against the runner's Docker daemon. The test
  self-skips Docker-less (keeps `make test` / Platform (Go) green on
  dev machines); this job runs on a Docker-capable runner and treats a
  SKIP or empty run as a FAILURE (fail-closed).
- outcomes parsed from the test2json stream as real JSON (Package sits
  between Action and Test; a grep adjacency match counts zero — a
  vacuous-green trap caught and fixed in verification).
- requires BOTH the headline parity test AND its fail-direction proof
  control (TestTokenOwnership_FailPre_ProvesCatch) to pass.
- joins the `CI / all-required` aggregator (RFC internal#219 §2) so
  branch protection fail-closes on it with NO branch-protection edit.

Verified locally: PASS-case exit 0; Hermes-bug-present FAIL-case exit
1; no-daemon SKIP-case exit 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 11:56:22 -07:00
efd755604f fix(provisioner): add local-only build tag to Docker ownership test
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 30s
CI / Detect changes (pull_request) Successful in 35s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 38s
E2E API Smoke Test / detect-changes (pull_request) Successful in 24s
E2E Chat / detect-changes (pull_request) Successful in 27s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 24s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 1m8s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 19s
Harness Replays / detect-changes (pull_request) Successful in 21s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 34s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 21s
gate-check-v3 / gate-check (pull_request) Successful in 27s
qa-review / approved (pull_request) Failing after 19s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m35s
security-review / approved (pull_request) Failing after 23s
sop-checklist / all-items-acked (pull_request) Successful in 21s
sop-tier-check / tier-check (pull_request) Successful in 18s
CI / Python Lint & Test (pull_request) Successful in 8m41s
CI / Canvas (Next.js) (pull_request) Successful in 24m4s
CI / Platform (Go) (pull_request) Successful in 27m43s
CI / all-required (pull_request) Successful in 27m5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 25s
Harness Replays / Harness Replays (pull_request) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m52s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Chat / E2E Chat (pull_request) Failing after 10m31s
provisioner_token_ownership_local_test.go invokes WriteAuthTokenToVolume
and WriteFilesToContainer against a real local Docker daemon — it is
designed to run manually with `go test -tags local`, not in CI.

Without the tag, `go test ./...` picks it up in the CI pipeline where
no Docker daemon is available, causing the test to hang/timeout
(~26 min before the runner kills it).

Add `//go:build local` + `// +build local` (matching the existing
integration-test pattern) so CI's `go test ./...` silently skips it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 15:47:28 +00:00
71ba2a572c Merge branch 'main' into test/local-provisioner-token-ownership-parity
Some checks failed
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 17s
CI / Detect changes (pull_request) Successful in 22s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 43s
E2E API Smoke Test / detect-changes (pull_request) Successful in 38s
E2E Chat / detect-changes (pull_request) Successful in 23s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 24s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 17s
Harness Replays / detect-changes (pull_request) Successful in 19s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 1m7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 27s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 23s
gate-check-v3 / gate-check (pull_request) Successful in 34s
qa-review / approved (pull_request) Successful in 21s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m38s
security-review / approved (pull_request) Failing after 21s
sop-checklist / all-items-acked (pull_request) Successful in 16s
sop-tier-check / tier-check (pull_request) Successful in 20s
CI / Python Lint & Test (pull_request) Successful in 8m37s
CI / Canvas (Next.js) (pull_request) Successful in 25m41s
CI / Platform (Go) (pull_request) Failing after 28m36s
CI / all-required (pull_request) Failing after 25m17s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 19s
Harness Replays / Harness Replays (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3m6s
E2E Chat / E2E Chat (pull_request) Failing after 12m10s
2026-05-16 12:54:37 +00:00
4d3c326fd9 test(provisioner): fast local-Docker parity test for the token-injection ownership bug class
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Failing after 0s
lint-required-no-paths / lint-required-no-paths (pull_request) Failing after 0s
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 0s
qa-review / approved (pull_request) Failing after 0s
security-review / approved (pull_request) Failing after 1s
sop-checklist / all-items-acked (pull_request) Successful in 15s
gate-check-v3 / gate-check (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 2m1s
The Hermes fleet-wide list_peers 401 (#1877/#418) came from
WriteAuthTokenToVolume + WriteFilesToContainer delivering /configs
token files root:root AFTER the entrypoint's chown -R agent /configs,
so the AgentUID a2a_mcp_server got EACCES → empty bearer → 401. Those
are Docker API ops, NOT AWS — they were only "prod-only" because the
local stack didn't drive the same post-start re-injection sequence,
NOT because they need EC2.

This test invokes the REAL WriteAuthTokenToVolume + WriteFilesToContainer
against the LOCAL Docker daemon and asserts AgentUID can re-write
/configs/.auth_token + .platform_inbound_secret (the save_token
O_WRONLY|O_TRUNC recovery path that actually 401'd Hermes — a read
probe stays green on root:root because the file is world-readable, so
that would have been a vacuous proxy assertion).

Demonstrated both directions against the two code states:
  - pre-fix (pristine staging): headline test FAILS in ~0.9s — would
    have caught Hermes locally instead of an ~1h EC2 round-trip.
  - post-fix (this PR's base, the agent-owned-injection fix): PASSES
    in ~0.87s.
TestTokenOwnership_FailPre_ProvesCatch pins the pre-fix root:root
delivery shape independently so the catch stays demonstrable on this
fix-based branch (the assertion is load-bearing, not vacuously green).
TestTokenOwnership_DockerIsLocalNotAWS statically guards that the
provisioner has no AWS SDK dep — the reason this bug class is locally
reproducible at all.

Wired into the mandatory local-E2E gate via `make test-local-e2e`
(feedback_mandatory_local_e2e_before_ship); self-skips when no Docker
daemon is reachable so `make test`/CI stays green on Docker-less
runners. Local fast counterpart to the staging-required gate.

Stacked on fix/workspace-token-injection-agent-owned (PR #1327) so it
lands green; references the exported provisioner.AgentUID contract
rather than a duplicated literal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:37:07 -07:00