Splits the reusable validator into two jobs to keep external fork
PRs from running arbitrary template code on the runner.
Background
The reusable workflow runs three primitives that execute
template-supplied code:
- pip install -r requirements.txt (setup.py + post-install hooks)
- importlib.exec_module(adapter) (top-level Python in adapter.py)
- docker build (RUN steps in Dockerfile)
Token scope is already minimal (contents: read), GitHub forced
fork-PR tokens read-only in 2021, and the workflow_call interface
doesn't accept secrets. So the actual exploit surface is "what can
a malicious actor do with arbitrary code execution on a GitHub-
hosted runner that has no useful credentials?" — answer: crypto-
mine, DNS-exfiltrate runner metadata, attempt lateral movement
within the runner's network. Annoying, not catastrophic, but a
real attack surface that this PR closes.
The fix
Two-job split:
validate-static Always runs, including external fork PRs.
File-content checks (secret scan, YAML parse,
AST inspection of adapter.py without import),
pip install only the validator's pyyaml dep
(not the template's requirements.txt). NO
third-party code execution.
validate-runtime Skipped when github.event.pull_request.head.
repo.fork == true. pip install requirements.txt
+ adapter import + docker build. Internal PRs
and push events to internal branches still get
the full coverage.
The validator script gains a --static-only flag that skips
check_adapter_runtime_load() (the function that calls
exec_module). The validate-static job uses it; validate-runtime
uses the existing full mode.
Trade-off
External contributors get static feedback only on their PR. If
their template metadata passes static checks but breaks runtime
loading, branch protection on staging/main blocks the merge once
runtime validation runs (post-merge or after an internal
contributor reposts). Fewer false-positive CI failures for honest
external contributors; same coverage at the merge-protected
boundary.
What this does NOT close
- Maintainer-approved external PRs that consciously execute
third-party code. The maintainer must approve a workflow run
via GitHub's first-time-contributor gate; that's a human
decision, not a workflow-level gate.
- requirements.txt that pulls a malicious transitive dep from
PyPI even on internal PRs. Mitigated by branch-protection +
human review of PRs that touch requirements.txt.
Closes task #135.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
167 lines
7.6 KiB
YAML
167 lines
7.6 KiB
YAML
name: Validate Workspace Template
|
|
on:
|
|
workflow_call:
|
|
|
|
# Defense-in-depth on the GITHUB_TOKEN scope. This workflow runs
|
|
# untrusted-by-design code from the calling template repo — pip
|
|
# installs the template's requirements.txt (post-install hooks),
|
|
# imports adapter.py, and `docker build`s the Dockerfile (RUN
|
|
# steps). Each of those primitives can execute arbitrary code with
|
|
# the token in env. Pinning `contents: read` means the worst a
|
|
# malicious template PR can do with the token is read public repo
|
|
# state — no write to issues, no push to branches, no comment-spam,
|
|
# no workflow re-trigger.
|
|
#
|
|
# Fork-PR lockdown (#135): the workflow splits into two jobs:
|
|
#
|
|
# validate-static — file-content checks only (secret scan, YAML
|
|
# parse, AST inspection of adapter.py without
|
|
# import). Always runs, including external fork
|
|
# PRs. Safe because no third-party code executes.
|
|
#
|
|
# validate-runtime — pip install requirements.txt + import
|
|
# adapter.py + docker build. SKIPPED on fork
|
|
# PRs because each step is arbitrary code
|
|
# execution from the template repo's perspective.
|
|
# Internal PRs and post-merge runs still get
|
|
# the full coverage.
|
|
#
|
|
# What this prevents: a malicious external PR can no longer
|
|
# crypto-mine on the runner, DNS-exfiltrate runner metadata, or
|
|
# attempt to read GitHub-Actions internal env via a setup.py
|
|
# postinstall hook. They still get static feedback (secret scan
|
|
# is the most important security check anyway).
|
|
#
|
|
# What this does NOT prevent: malicious template metadata that
|
|
# passes static checks. The runtime job catches those once the PR
|
|
# merges (or an internal contributor reposts the change), at which
|
|
# point branch protection on staging/main blocks the merge if
|
|
# runtime validation fails.
|
|
permissions:
|
|
contents: read
|
|
|
|
jobs:
|
|
validate-static:
|
|
name: Template validation (static)
|
|
runs-on: ubuntu-latest
|
|
timeout-minutes: 5
|
|
steps:
|
|
# Calling template repo (Dockerfile + config.yaml + adapter.py).
|
|
- uses: actions/checkout@v4
|
|
# Canonical validator script lives in molecule-ci, fetched fresh on
|
|
# every run. The previous setup expected `.molecule-ci/scripts/` to
|
|
# be vendored INTO each template repo, which drifted across the 8
|
|
# template repos as the validator evolved. Single source of truth
|
|
# eliminates that drift class entirely — every template runs the
|
|
# same canonical contract check on every CI run.
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
repository: Molecule-AI/molecule-ci
|
|
path: .molecule-ci-canonical
|
|
- uses: actions/setup-python@v5
|
|
with:
|
|
python-version: "3.11"
|
|
# Secret scan — the most important check. Always runs.
|
|
- name: Check for secrets
|
|
run: |
|
|
python3 - << 'PYEOF'
|
|
import os, re, sys
|
|
from pathlib import Path
|
|
|
|
PATTERNS = [
|
|
re.compile(r'''["']sk-ant-[a-zA-Z0-9]{50,}["']'''),
|
|
re.compile(r'''["']ghp_[a-zA-Z0-9]{36,}["']'''),
|
|
re.compile(r'''["']AKIA[A-Z0-9]{16}["']'''),
|
|
re.compile(r'''["'][a-zA-Z0-9/+=]{40}["']'''),
|
|
re.compile(r'''["']sk_test_[a-zA-Z0-9]{24,}["']'''),
|
|
re.compile(r'''["']Bearer\s+[a-zA-Z0-9_.-]{20,}["']'''),
|
|
re.compile(r'''ghp_[a-zA-Z0-9]{36,}'''),
|
|
re.compile(r'''sk-ant-[a-zA-Z0-9]{50,}'''),
|
|
]
|
|
SKIP_DIRS = {'.molecule-ci', '.git', 'node_modules', '__pycache__'}
|
|
EXTENSIONS = {'.yaml', '.yml', '.md', '.py', '.sh'}
|
|
|
|
def is_false_positive(line):
|
|
ctx = line.lower()
|
|
return '...' in ctx or '<example' in ctx or '</example' in ctx
|
|
|
|
root = Path(os.environ.get('GITHUB_WORKSPACE', '.'))
|
|
warnings = []
|
|
for dirpath, dirnames, filenames in os.walk(root):
|
|
dirnames[:] = [d for d in dirnames if d not in SKIP_DIRS]
|
|
for filename in filenames:
|
|
if Path(filename).suffix not in EXTENSIONS:
|
|
continue
|
|
filepath = Path(dirpath) / filename
|
|
try:
|
|
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
|
|
for lineno, line in enumerate(f.readlines(), 1):
|
|
for pattern in PATTERNS:
|
|
for match in pattern.finditer(line):
|
|
if not is_false_positive(line):
|
|
warnings.append(f" {filepath}:{lineno}: {match.group(0)[:40]}...")
|
|
except Exception:
|
|
pass
|
|
|
|
if warnings:
|
|
print("::error::Potential secret found in committed files:")
|
|
for w in warnings:
|
|
print(w)
|
|
sys.exit(1)
|
|
else:
|
|
print("::notice::No secrets detected")
|
|
PYEOF
|
|
# Static-only validator — file existence checks, YAML parse,
|
|
# AST inspection of adapter.py (no import). Doesn't execute
|
|
# any third-party code; safe on fork PRs.
|
|
- run: pip install pyyaml -q
|
|
- run: python3 .molecule-ci-canonical/scripts/validate-workspace-template.py --static-only
|
|
|
|
validate-runtime:
|
|
name: Template validation (runtime)
|
|
runs-on: ubuntu-latest
|
|
timeout-minutes: 15
|
|
needs: validate-static
|
|
# Skip when the PR comes from a fork — those are external,
|
|
# untrusted, and would let attackers run pip install / docker
|
|
# build / adapter.py import on our runner. Internal PRs (head
|
|
# repo == base repo, fork == false) and push events to internal
|
|
# branches both keep full coverage.
|
|
#
|
|
# github.event.pull_request.head.repo.fork is null for non-PR
|
|
# events (push, schedule, etc.) — defaults to running.
|
|
if: github.event.pull_request.head.repo.fork != true
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
repository: Molecule-AI/molecule-ci
|
|
path: .molecule-ci-canonical
|
|
- uses: actions/setup-python@v5
|
|
with:
|
|
python-version: "3.11"
|
|
# Cache pip against the calling repo's own requirements.txt
|
|
# (the file we install one step below). Pointing the cache key
|
|
# at the validator's own deps was decorative — pyyaml never
|
|
# changes, so the key never invalidated even when the template
|
|
# added a heavy dep like crewai.
|
|
cache: "pip"
|
|
cache-dependency-path: requirements.txt
|
|
- run: pip install pyyaml -q
|
|
# Install the template's runtime dependencies so the validator's
|
|
# `check_adapter_runtime_load()` can import adapter.py the same way
|
|
# the workspace container does at boot. Without this, a
|
|
# syntactically-valid adapter that ImportErrors on a missing
|
|
# transitive dep would build clean and crash on first user prompt.
|
|
# The fallback (no requirements.txt) installs the runtime alone so
|
|
# BaseAdapter is at least importable for the class-discovery check.
|
|
- if: hashFiles('requirements.txt') != ''
|
|
run: pip install -q -r requirements.txt
|
|
- if: hashFiles('requirements.txt') == ''
|
|
run: pip install -q molecule-ai-workspace-runtime
|
|
# Full validator — includes adapter.py import (exec_module).
|
|
- run: python3 .molecule-ci-canonical/scripts/validate-workspace-template.py
|
|
- name: Docker build smoke test
|
|
if: hashFiles('Dockerfile') != ''
|
|
run: docker build -t template-test . --no-cache 2>&1 | tail -5 && echo "✓ Docker build succeeded"
|