fix(ci): printf format-string sink + filename word-split in secret-scan

Two latent bash bugs in the canonical secret-scan workflow caught
during the post-merge review of molecule-controlplane #301 (a
private consumer that inlined this workflow's logic and got both
fixes there). Same bugs apply here; fixing in canonical means every
public consumer (gh-identity, github-app-auth, the 8 workspace
template repos) inherits the fix on their next workflow_call.

Bug 1: `printf "$OFFENDING"` is a format-string sink.

  OFFENDING is built from filenames: `${f} (matched: ${pattern})\n`.
  When passed to printf as the first argument, `%` characters in a
  filename are interpreted as conversion specifiers — corrupting the
  error message or printing `%(missing)` artifacts. No filename in
  the current tree triggers it, but a future test fixture, build
  artifact, or contributor-supplied path could.

  Fix: `printf '%b' "$OFFENDING"` interprets the literal `\n` we
  appended without treating OFFENDING as a format string.

Bug 2: `for f in $CHANGED` word-splits on whitespace.

  Filenames containing spaces would split into multiple tokens. The
  self-exclude check (`[ "$f" = "$SELF" ] && continue`) and the diff
  lookup would both operate on partial-path tokens. No filename in
  the current tree has whitespace, but the failure would be silent
  if one ever did.

  Fix: `while IFS= read -r f; do ... done <<< "$CHANGED"` reads
  whole lines as filenames. Added `[ -z "$f" ] && continue` to
  match the original `for` loop's implicit empty-input skip.

Both fixes are mechanically straightforward (~16 lines net diff,
mostly comments documenting the why). No behavior change for
filenames in the current tree; strictly better for the edge cases.

The same fixes already shipped in molecule-controlplane via #301
which inlined a copy of this workflow. The runtime's bundled
pre-commit hook (molecule-ai-workspace-runtime:
molecule_runtime/scripts/pre-commit-checks.sh) likely has the same
bugs — flagged as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-28 14:02:50 -07:00
parent 7eeac153de
commit 2c8792d3e0

View File

@ -148,7 +148,13 @@ jobs:
SELF=".github/workflows/secret-scan.yml"
OFFENDING=""
for f in $CHANGED; do
# `while IFS= read -r` (not `for f in $CHANGED`) so filenames
# containing whitespace don't word-split silently — a path
# with a space would otherwise produce two iterations on
# tokens that aren't real filenames, breaking the
# self-exclude + diff lookup.
while IFS= read -r f; do
[ -z "$f" ] && continue
[ "$f" = "$SELF" ] && continue
if [ -n "$DIFF_RANGE" ]; then
ADDED=$(git diff --no-color --unified=0 "$BASE" "$HEAD" -- "$f" 2>/dev/null | grep -E '^\+[^+]' || true)
@ -164,11 +170,18 @@ jobs:
break
fi
done
done
done <<< "$CHANGED"
if [ -n "$OFFENDING" ]; then
echo "::error::Credential-shaped strings detected in diff additions:"
printf "$OFFENDING"
# `printf '%b' "$OFFENDING"` interprets backslash escapes
# (the literal `\n` we appended above becomes a newline)
# WITHOUT treating OFFENDING as a format string. Plain
# `printf "$OFFENDING"` is a format-string sink: a filename
# containing `%` would be interpreted as a conversion
# specifier, corrupting the error message (or printing
# `%(missing)` artifacts).
printf '%b' "$OFFENDING"
echo ""
echo "The actual matched values are NOT echoed here, deliberately —"
echo "round-tripping a leaked credential into CI logs widens the blast"