Compare commits

...

1 Commits

Author SHA1 Message Date
core-be e260bf24bd fix(t4): docker_socket_reachable + pid_host_visible probes — drop unnecessary sudo / suppress / wrong perms (RCA from ad71d1f, unblocks template-cc PR#39 migration pilot)
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 18s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
E2E Chat / detect-changes (pull_request) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 44s
Harness Replays / detect-changes (pull_request) Successful in 12s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 11s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
qa-review / approved (pull_request) Failing after 7s
security-review / approved (pull_request) Failing after 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m20s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 1m36s
CI / Platform (Go) (pull_request) Successful in 6m8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 7m14s
CI / Python Lint & Test (pull_request) Successful in 7m20s
CI / all-required (pull_request) Successful in 7m41s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Chat / E2E Chat (pull_request) Failing after 6m40s
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Waiting to run
gate-check-v3 / gate-check (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-checklist / review-refire (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Waiting to run
Two gate-side probe bugs identified by RCA ad71d1f. template-cc PR#39
(uniform-contract migration pilot) surfaced both — fixes are gate-side,
not template-cc-side.

1. docker_socket_reachable
   Before: sudo -n docker version --format '{{.Server.Version}}' >/dev/null 2>&1
   After:  docker -H unix:///var/run/docker.sock version --format '{{.Server.Version}}'
   - agent user is in the `docker` group, so sudo is unnecessary and
     introduces a PATH/env mismatch under `sudo -n` (docker CLI not-found).
   - explicit -H unix:///var/run/docker.sock pins the transport
     deterministically.
   - dropping `>/dev/null 2>&1` lets daemon errors surface for diagnosis;
     non-zero exit still trips the probe, but the failure mode is now
     visible in CI/probe output instead of being silently swallowed.

2. pid_host_visible
   Before: [ -d /proc/1/root ] && [ "$(sudo -n readlink /proc/1/ns/pid)" = "$(sudo -n readlink /proc/self/ns/pid)" ]
   After:  sudo -n test -d /proc/1/root && [ "$(sudo -n readlink /proc/1/ns/pid)" = "$(sudo -n readlink /proc/self/ns/pid)" ]
   - /proc/1/root is a root-owned symlink; `[ -d ... ]` evaluated as
     uid-1000 (agent user) returns false (EACCES) even when host PID ns
     IS reachable — false negative.
   - sibling-pattern with the host-reach probes that PASS:
       host_root_reach_via_nsenter: `sudo -n nsenter ...`
       host_fs_write_readback:      `sudo -n sh -c "..."`
     Consistent `sudo -n` usage for host-namespace inspection.
   - readlink half already uses sudo -n; this fix makes the directory
     check consistent with it.

Probes remain executable from inside the workspace container as the
`agent` user. No semantics changed elsewhere (advisory/hard
classification preserved, no probe added/removed). Total diff: 2 lines.

/sop-ack root-cause-and-no-backwards-compat
2026-05-20 11:37:28 -07:00
@@ -121,7 +121,7 @@ func T4PrivilegeContract() []T4Capability {
{
Name: "docker_socket_reachable",
Description: "/var/run/docker.sock is bind-mounted into the container so the agent can manage other containers (T4 use case: agent-as-orchestrator). Proven by 'docker version' returning a server section, which requires the daemon to answer over the socket.",
Probe: `sudo -n docker version --format '{{.Server.Version}}' >/dev/null 2>&1`,
Probe: `docker -H unix:///var/run/docker.sock version --format '{{.Server.Version}}'`,
Severity: SeverityHard,
Source: "provisioner.go applyHostConfig T4 branch (case 4)",
},
@@ -169,7 +169,7 @@ func T4PrivilegeContract() []T4Capability {
{
Name: "pid_host_visible",
Description: "Host PID namespace is shared (--pid=host). The container can see host process 1 (systemd or pid-1 on the EC2 instance). Required for nsenter into host mount/pid namespaces.",
Probe: `[ -d /proc/1/root ] && [ "$(sudo -n readlink /proc/1/ns/pid)" = "$(sudo -n readlink /proc/self/ns/pid)" ]`,
Probe: `sudo -n test -d /proc/1/root && [ "$(sudo -n readlink /proc/1/ns/pid)" = "$(sudo -n readlink /proc/self/ns/pid)" ]`,
Severity: SeverityHard,
Source: "provisioner.go applyHostConfig T4 branch (case 4): hostCfg.PidMode = 'host'",
},