Files
hongming eb68b7b025
CI / validate (push) Blocked by required conditions
CI / validate (pull_request) Blocked by required conditions
CI / Template validation (static) (push) Successful in 1m20s
CI / Adapter unit tests (push) Successful in 1m9s
CI / Template validation (static) (pull_request) Successful in 1m25s
CI / Adapter unit tests (pull_request) Successful in 1m2s
CI / T4 tier-4 conformance (live) (push) Failing after 52s
CI / Template validation (runtime) (push) Failing after 2m51s
CI / T4 tier-4 conformance (live) (pull_request) Failing after 51s
CI / Template validation (runtime) (pull_request) Failing after 2m40s
fix(auth): codex_auth_refresh.sh portable python3 path
PR#19 (internal#569) shipped the OAuth refresh watchdog hardcoded to
`/opt/molecule-venv/bin/python3`. The codex image is built FROM
python:3.11-slim — python3 lives at /usr/local/bin/python3 and no
/opt/molecule-venv ever exists. Every helper invocation therefore
exited 127 → OAuth refresh never fired → id_token expired silently →
Researcher wedged upstream of stdout (ae2c3012 diagnosis: 55h past
expiration, access_token still valid ~7.7d but id_token rejected).

Fix:
- Resolve python3 portably via `command -v python3` at script start
  (CODEX_PYTHON env override for test/dev rigs); fail-fast with rc=127
  if no python3 found, so the symptom is loud not silent.
- Replace all 6 hardcoded `/opt/molecule-venv/bin/python3` call sites
  with "$PYTHON_BIN".
- Drop the test harness's `_patch_script_python` hack — the test file
  used to know about the broken path and patch a copy of the script,
  which masked the production breakage. Tests now exec the real
  shipped script with CODEX_PYTHON pointing at sys.executable.
- Add `test_script_does_not_exit_127_with_portable_python_path` as a
  regression-pin: runs the script WITHOUT CODEX_PYTHON so the
  `command -v python3` resolver is genuinely exercised, asserts the
  script never exits 127 and produces the expected skip path.
- Add an image-build smoke check in the Dockerfile: runs
  `codex_auth_refresh.sh --once` against an absent CODEX_HOME at
  build time, fails the build if rc=127 or rc≠1. This makes the
  watchdog a hard image-build invariant; a future regression of the
  python path cannot ship.

Verified locally:
- bash -n codex_auth_refresh.sh → OK
- shellcheck -S error codex_auth_refresh.sh + boot helpers → 0
- pytest tests/test_auth_refresh_watchdog.py → 8 passed
- pytest tests/ (full) → 75 passed
- Direct script smoke with empty CODEX_HOME → rc=1
  "absent or empty" (NOT rc=127)

Cross-links:
- ae2c3012 diagnosis (Researcher wedge, id_token expiry, access_token
  remaining ~198h)
- PR#19 (internal#569) — the auto-refresh feature this fix unblocks
- feedback_image_promote_is_not_user_live — cascade is image-build →
  ECR push → CP pin promote → workspace recreate before user-live

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 03:01:10 -07:00
..