Splits the reusable validator into two jobs to keep external fork
PRs from running arbitrary template code on the runner.
Background
The reusable workflow runs three primitives that execute
template-supplied code:
- pip install -r requirements.txt (setup.py + post-install hooks)
- importlib.exec_module(adapter) (top-level Python in adapter.py)
- docker build (RUN steps in Dockerfile)
Token scope is already minimal (contents: read), GitHub forced
fork-PR tokens read-only in 2021, and the workflow_call interface
doesn't accept secrets. So the actual exploit surface is "what can
a malicious actor do with arbitrary code execution on a GitHub-
hosted runner that has no useful credentials?" — answer: crypto-
mine, DNS-exfiltrate runner metadata, attempt lateral movement
within the runner's network. Annoying, not catastrophic, but a
real attack surface that this PR closes.
The fix
Two-job split:
validate-static Always runs, including external fork PRs.
File-content checks (secret scan, YAML parse,
AST inspection of adapter.py without import),
pip install only the validator's pyyaml dep
(not the template's requirements.txt). NO
third-party code execution.
validate-runtime Skipped when github.event.pull_request.head.
repo.fork == true. pip install requirements.txt
+ adapter import + docker build. Internal PRs
and push events to internal branches still get
the full coverage.
The validator script gains a --static-only flag that skips
check_adapter_runtime_load() (the function that calls
exec_module). The validate-static job uses it; validate-runtime
uses the existing full mode.
Trade-off
External contributors get static feedback only on their PR. If
their template metadata passes static checks but breaks runtime
loading, branch protection on staging/main blocks the merge once
runtime validation runs (post-merge or after an internal
contributor reposts). Fewer false-positive CI failures for honest
external contributors; same coverage at the merge-protected
boundary.
What this does NOT close
- Maintainer-approved external PRs that consciously execute
third-party code. The maintainer must approve a workflow run
via GitHub's first-time-contributor gate; that's a human
decision, not a workflow-level gate.
- requirements.txt that pulls a malicious transitive dep from
PyPI even on internal PRs. Mitigated by branch-protection +
human review of PRs that touch requirements.txt.
Closes task #135.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent post-merge review of #19 surfaced two more findings.
Both shipped here.
Q3 — abstract intermediates + multiple-concrete-classes.
The class-discovery filter from O1 (#19) only excluded BaseAdapter
itself. Two failure modes slipped through:
(a) A locally-defined abstract intermediate
`class FrameworkAdapter(BaseAdapter): @abstractmethod ...`
passed the filter, falsely satisfying "at least one
concrete subclass" while still being non-instantiable at
workspace boot.
(b) A template defining BOTH `class FrameworkAdapter(BaseAdapter)`
AND `class ConcreteAdapter(FrameworkAdapter)` had both pass
the filter, producing a silent ambiguity where the runtime's
class-discovery picks one per its resolution rules — wrong
class loaded after a future runtime refactor.
Fixes:
- Add `not inspect.isabstract(obj)` to the discovery filter so
abstract intermediates are excluded.
- Hard-error if `len(adapter_classes) > 1` listing both names so
the contributor knows exactly which classes are competing.
Three new tests pin the behaviors:
- test_abstract_intermediate_alone_does_not_count
- test_abstract_plus_concrete_passes_with_concrete_only
- test_multiple_concrete_baseadapter_subclasses_errors
Identity-based deduplication.
Caught against the real langgraph template during smoke-testing
the Q3 fix: production adapters often do
`Adapter = ConcreteAdapter` as a module-level alias for the
runtime's discovery convention. `vars(mod)` returns BOTH bindings
pointing at the same class object, so the new
multiple-concrete-classes error fired falsely on every aliased
template.
Fix: deduplicate by `id(obj)` BEFORE counting, so the same class
object under multiple bindings counts once. New regression test
test_aliased_concrete_class_is_deduplicated pins this against
any future filter regression.
Existing tests updated to use fully-concrete BaseAdapter subclasses
(matching production templates) since the new abstract-filter
correctly rejects partial stubs that don't override every abstract
method BaseAdapter declares (5 methods: name, display_name,
description, setup, create_executor).
Q5 — GITHUB_TOKEN scope lockdown.
validate-workspace-template.yml runs untrusted-by-design code from
the calling template repo: pip post-install hooks, adapter.py
imports, Dockerfile RUN steps. Each of those primitives executes
with GITHUB_TOKEN in env. The workflow had no `permissions:`
block, defaulting to whatever the calling repo grants — often
contents: write.
Add `permissions: contents: read` at the workflow level. Worst-
case-with-token now drops to "read public repo state" — no write
to issues, no push to branches, no comment-spam, no workflow
re-trigger. Partial mitigation; the deeper `pull_request_target`
discipline is bigger scope (tracked separately).
Verification:
- 47/47 tests pass (was 43; +3 abstract/multi-concrete + +1 alias)
- All 8 production templates pass the full updated validator
end-to-end with 0 warnings / 0 errors
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cleanup of #19's commit, which inadvertently included scripts/__pycache__/
.pyc files generated by running pytest locally during the review-
followup work. The repo's .gitignore had no Python-cache section at
all, so nothing prevented this — adding it now to make the same
mistake structurally impossible.
Files removed from tracking (still ignored locally going forward):
- scripts/__pycache__/migrate-template.cpython-313.pyc
- scripts/__pycache__/test_migrate_template.cpython-313-pytest-9.0.3.pyc
- scripts/__pycache__/test_validate_workspace_template.cpython-313-pytest-9.0.3.pyc
- scripts/__pycache__/validate-workspace-template.cpython-313.pyc
Gitignore additions cover the standard set:
__pycache__/, *.pyc, *.pyo, *.pyd, .pytest_cache/, .mypy_cache/,
.ruff_cache/
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent code review of #17 (adapter runtime-load) and #18 (schema
versioning) surfaced four Required and three Optional findings worth
fixing before the patterns harden into the codebase.
Required:
R1: Delete .molecule-ci/scripts/{validate-workspace-template,
migrate-template}.py — dead-vendored mirror. The new validator
workflow invokes .molecule-ci-canonical/scripts/ (the canonical
clone), not .molecule-ci/scripts/. The mirror was the exact drift
class #90 is supposed to eliminate: next contributor would edit
one copy and silently diverge. Other workflows (validate-plugin,
validate-org-template) still use the legacy path and keep their
own scripts there — so removing OUR two files is asymmetric but
correct, and the legacy path can phase out organically.
R2: validate-workspace-template.yml's `cache-dependency-path` pointed
at the validator's own deps file (just `pyyaml>=6.0`). Pip cache
key never invalidated when the template added crewai/langgraph/
etc. Repoint to the calling repo's `requirements.txt`, which is
the file the heavy install actually uses one step later.
R3: `_check_schema_v1` looped `SCHEMA_V1_REQUIRED_KEYS` and re-emitted
"missing required key `template_schema_version`" — but the
dispatcher already verified the field is present + int before
reaching v1, so that branch was dead defensive code. Skip it
explicitly with a comment, but keep the field in the constant for
contract documentation + the unknown-keys filter.
R4: `_template_adapter_under_validation` was a fixed sys.modules key,
meaning back-to-back invocations in the same Python process
shared the slot. Use a per-call-unique name keyed on the absolute
path's hash. No observed bug today; defensive-only.
Optional:
O1: Class-discovery filter now also requires `__module__ == module_name`.
Without this, an `from molecule_runtime.adapters.base import
AbstractCLIAdapter` re-export would count as a "real" adapter,
masking the genuine "no concrete subclass" case the gate exists
to catch. Cheap and forward-proofs against any future abstract
intermediate the runtime might expose. Added a sibling test
pinning the new behavior.
O2: migrate-template.py's docstring claimed "uses ruamel.yaml when
available" but the implementation only ever calls `yaml.safe_dump`.
Replaced the lie with a clearer caveat block + a forward-pointer
to ruamel-when-comments-detected as a future enhancement.
O3: Reordered the workflow so the secret-scan step runs BEFORE
`pip install -r requirements.txt`. Same threat surface as the
Docker build smoke (which already runs first), but cheap defense-
in-depth: a malicious template PR adding a malicious dep to
requirements.txt now has its post-install hook execute AFTER the
secret scanner has already inspected the diff.
Test changes:
- test_adapter_with_no_baseadapter_subclass_errors updated for the
new error message ("no concrete class inheriting from").
- New test_only_imported_baseadapter_subclass_does_not_count pins
the O1 __module__-filter behavior.
- 43/43 tests pass (was 42/42 before the new test).
- Real langgraph template still passes the full validator end-to-end.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the schema-versioning workstream of #90. Sets up the machinery
for "we will be updating a lot" (the user's framing) without forcing
the first real schema bump to discover semantics under deadline
pressure. Today every template is at v1; this PR adds the framework,
ships zero behavior change for v1 templates, and reserves v2+ for
when there's a concrete reason to bump.
Validator changes:
- `KNOWN_SCHEMA_VERSIONS = {1}` — the set the validator currently
accepts. Future bumps add to this set.
- `DEPRECATED_SCHEMA_VERSIONS: set[int] = set()` — versions accepted
with warning during a deprecation window.
- Per-version contract: `_check_schema_v1(config)` enforces the v1
REQUIRED_KEYS / OPTIONAL_KEYS / KNOWN_RUNTIMES contract — exactly
what the previous monolithic check_config_yaml did.
- Dispatch table: `SCHEMA_CHECKS = {1: _check_schema_v1}`. Versions
that aren't in the table hard-error.
- check_config_yaml() now: reads template_schema_version → emits
deprecation warning if applicable → dispatches to the right
SCHEMA_CHECKS entry → unknown versions hard-error with actionable
instructions ("add a SCHEMA_V<N> block").
- Schema versions are FROZEN once shipped: never edit a SCHEMA_V<N>
constant in place. To bump, ADD v<N+1> alongside, deprecate v<N>,
migrate consumers, drop v<N> next cycle. Header comment documents
the discipline.
New script `migrate-template.py`:
- `MIGRATIONS: dict[int, Callable[[dict], dict]]` registry — each
entry maps a SOURCE version to the function that produces the
next version's dict. Empty today.
- `migrate_config(config, from, to)` chains migrations sequentially.
Forward-only (errors on backward), errors on missing intermediate
steps (never silently skip), asserts every migration stamps its
output's template_schema_version.
- CLI: `migrate-template.py [--from N] [--to M] [--dry-run] DIR`.
Defaults: --from = whatever config.yaml declares, --to = highest
reachable from MIGRATIONS (currently 1, so a no-op).
Behavior change to the existing
test_missing_required_keys_errors test:
Previously the validator emitted 3 "missing required key" errors
when name/runtime/template_schema_version were all missing. Now it
short-circuits on missing version with a single actionable error —
listing downstream missing keys is noise on top of the real
problem (no version means we can't pick a contract). The test was
updated to pin the new behavior; a new sibling test
(test_missing_required_keys_under_v1_dispatch_errors) pins that v1
still lists name/runtime/etc. when present-with-v1.
Verification:
- 42/42 tests pass (20 prior + 9 new schema-dispatch tests in
test_validate_workspace_template.py + 17 new migrator tests in
test_migrate_template.py).
- Real langgraph template runs through the full updated validator
end-to-end with 0 warnings / 0 errors.
This + #17 means #90 is done end-to-end:
- Phase 2: validator green on all 8 templates as a required check (already shipped)
- Phase 2.5: adapter.py runtime-load contract (#17)
- Phase 3: schema versioning + migration framework (this PR)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the third workstream of #90 (eliminate template repo drift): a
strong contract check that exercises adapter.py the same way the
runtime does at workspace boot. Without this, a template can have a
syntactically-valid Dockerfile + an adapter.py that ImportErrors at
runtime, build clean through Docker smoke, and crash on first user
prompt — exactly the human-error class #90 is meant to eliminate.
Existing checks ranked from weakest to strongest:
1. check_adapter() — text-grep for legacy `molecule_ai`
imports. Catches one specific footgun.
2. Docker build smoke — `docker build` succeeds. Doesn't RUN
the image, so adapter.py is never
imported. Misses every adapter-load
bug.
3. (NEW) check_adapter_runtime_load — imports adapter.py via the
same `importlib.spec_from_file_location`
path the runtime uses, and asserts at
least one class inherits from
molecule_runtime.adapters.base.BaseAdapter.
Hard-error conditions:
- adapter.py raises any exception during import (SyntaxError,
ImportError, NameError, etc.). Same exception would crash the
workspace at boot.
- No class in the module inherits from BaseAdapter. The runtime's
class-discovery silently falls through to the default langgraph
executor in this case — exactly the silent-failure shape the
contract is meant to catch.
Skip conditions:
- No adapter.py exists. Templates without one inherit the default
executor by design (policy, not drift).
- molecule-ai-workspace-runtime not importable in the validator
env. Warns loudly so the CI-config bug surfaces, but doesn't
hard-fail (we'd be reporting "your adapter is broken" when the
actual cause is missing infra).
Workflow update: validate-workspace-template.yml now installs the
template's requirements.txt before invoking the validator (or
falls back to installing molecule-ai-workspace-runtime alone if the
template has no requirements.txt). This satisfies the runtime-load
check's import dependencies the same way the workspace container
does at boot — `pip install -r requirements.txt`.
Verified locally:
- 20/20 tests in test_validate_workspace_template.py pass
(14 existing + 6 new).
- Real langgraph template passes the full new validator including
runtime-load (0 warnings, 0 errors).
- Surveyed all 8 production templates' adapter.py shapes; every
one already inherits from BaseAdapter, so this check turns green
on first run with zero per-template fixups needed.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P6 Phase 1: enforce the workspace-template contract via CI on every
template-repo push, eliminating the slow drift that produced 8
copies of a 28-line Dockerfile in different states of decay.
The previous validator (50 lines, soft warnings only) couldn't
catch the cache-trap pattern (Dockerfile missing ARG RUNTIME_VERSION)
that silently shipped the previous runtime wheel during cascade
publishes — observed five times in a row on 2026-04-27. Hardened
into structural checks that fail CI, not just warn:
- Dockerfile must base on python:3.11-slim
- Dockerfile must declare ARG RUNTIME_VERSION AND reference
${RUNTIME_VERSION} in a RUN block (the arg has to be in the
layer's command line for docker to hash it into the cache key)
- Dockerfile must create the agent uid-1000 user (Claude Code
refuses --dangerously-skip-permissions as root for safety)
- Dockerfile must end at molecule-runtime — directly via
ENTRYPOINT or via a wrapper script that exec's it (claude-code
has entrypoint.sh for gosu drop-priv; hermes has start.sh to
boot the hermes-agent daemon first; both are allowed)
- config.yaml must have name + runtime + integer
template_schema_version. Quoted "1" fails — observed previously
in a copy-pasted template that the YAML loader turned into str
- requirements.txt must declare molecule-ai-workspace-runtime
Also fixed: the original validator's warning telling adapter.py
NOT to import molecule_runtime was backwards — that's the
canonical package name post-#87. Now it warns on the legacy
molecule_ai prefix instead.
Reusable workflow change: instead of running
.molecule-ci/scripts/validate-workspace-template.py (a per-template
vendored copy that drifts as the validator evolves), the workflow
now checks out molecule-ci itself into .molecule-ci-canonical and
runs the canonical script from there. Single source of truth —
every template runs the SAME contract on every CI run. The legacy
.molecule-ci/scripts/ directories in each template repo can be
deleted in a Phase 2 cleanup PR.
14 unit tests pin the contract:
- canonical template passes
- claude-code-style custom entrypoint passes when the wrapper
exec's molecule-runtime
- 5 Dockerfile drift modes each error individually
- 3 config.yaml drift modes each error/warn
- requirements.txt missing-runtime errors
- legacy molecule_ai import warns
- regression cover: modern molecule_runtime import does NOT
trigger the (deleted) backwards warning
All 8 production template repos pass the new contract today —
this PR locks in the current good state, it does not force any
template-repo edits.
Contract documented at docs/template-contract.md so the rules are
discoverable without reading the validator.
The grep-based secrets check matched literal credential patterns in
documentation (e.g., "sk-ant-..." in CLAUDE.md examples), causing
false-positive CI failures.
Replace with a Python script that:
- Skips .molecule-ci/ directory entirely
- Uses context-aware matching (requires quotes or assignment context)
- Filters out documentation examples with "..." or <example> markers
- Handles all three reusable workflows (plugin, workspace-template, org-template)
- Remove redundant nested checkout of molecule-ci in workflow_call jobs
- Add timeout-minutes to prevent hung jobs (plugin: 10m, workspace: 15m)
- Add pip cache using requirements.txt
- Add missing SKILL.md heading check in validate-plugin
- Add legacy import and runtime dependency warnings in workspace validation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Heredocs in GitHub Actions YAML were being echoed as script text
instead of executed. Moving validation logic to scripts/ and running
via 'python3 .molecule-ci/scripts/validate-*.py' after checking out
the molecule-ci repo at .molecule-ci/ path.