main
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d420b4a24f |
ci: lock down validate-workspace-template against fork-PR untrusted code (P135)
Splits the reusable validator into two jobs to keep external fork
PRs from running arbitrary template code on the runner.
Background
The reusable workflow runs three primitives that execute
template-supplied code:
- pip install -r requirements.txt (setup.py + post-install hooks)
- importlib.exec_module(adapter) (top-level Python in adapter.py)
- docker build (RUN steps in Dockerfile)
Token scope is already minimal (contents: read), GitHub forced
fork-PR tokens read-only in 2021, and the workflow_call interface
doesn't accept secrets. So the actual exploit surface is "what can
a malicious actor do with arbitrary code execution on a GitHub-
hosted runner that has no useful credentials?" — answer: crypto-
mine, DNS-exfiltrate runner metadata, attempt lateral movement
within the runner's network. Annoying, not catastrophic, but a
real attack surface that this PR closes.
The fix
Two-job split:
validate-static Always runs, including external fork PRs.
File-content checks (secret scan, YAML parse,
AST inspection of adapter.py without import),
pip install only the validator's pyyaml dep
(not the template's requirements.txt). NO
third-party code execution.
validate-runtime Skipped when github.event.pull_request.head.
repo.fork == true. pip install requirements.txt
+ adapter import + docker build. Internal PRs
and push events to internal branches still get
the full coverage.
The validator script gains a --static-only flag that skips
check_adapter_runtime_load() (the function that calls
exec_module). The validate-static job uses it; validate-runtime
uses the existing full mode.
Trade-off
External contributors get static feedback only on their PR. If
their template metadata passes static checks but breaks runtime
loading, branch protection on staging/main blocks the merge once
runtime validation runs (post-merge or after an internal
contributor reposts). Fewer false-positive CI failures for honest
external contributors; same coverage at the merge-protected
boundary.
What this does NOT close
- Maintainer-approved external PRs that consciously execute
third-party code. The maintainer must approve a workflow run
via GitHub's first-time-contributor gate; that's a human
decision, not a workflow-level gate.
- requirements.txt that pulls a malicious transitive dep from
PyPI even on internal PRs. Mitigated by branch-protection +
human review of PRs that touch requirements.txt.
Closes task #135.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2e40916b57
|
fix(validator): handle abstract intermediates + class-aliasing + lock GITHUB_TOKEN scope (#21)
Independent post-merge review of #19 surfaced two more findings. Both shipped here. Q3 — abstract intermediates + multiple-concrete-classes. The class-discovery filter from O1 (#19) only excluded BaseAdapter itself. Two failure modes slipped through: (a) A locally-defined abstract intermediate `class FrameworkAdapter(BaseAdapter): @abstractmethod ...` passed the filter, falsely satisfying "at least one concrete subclass" while still being non-instantiable at workspace boot. (b) A template defining BOTH `class FrameworkAdapter(BaseAdapter)` AND `class ConcreteAdapter(FrameworkAdapter)` had both pass the filter, producing a silent ambiguity where the runtime's class-discovery picks one per its resolution rules — wrong class loaded after a future runtime refactor. Fixes: - Add `not inspect.isabstract(obj)` to the discovery filter so abstract intermediates are excluded. - Hard-error if `len(adapter_classes) > 1` listing both names so the contributor knows exactly which classes are competing. Three new tests pin the behaviors: - test_abstract_intermediate_alone_does_not_count - test_abstract_plus_concrete_passes_with_concrete_only - test_multiple_concrete_baseadapter_subclasses_errors Identity-based deduplication. Caught against the real langgraph template during smoke-testing the Q3 fix: production adapters often do `Adapter = ConcreteAdapter` as a module-level alias for the runtime's discovery convention. `vars(mod)` returns BOTH bindings pointing at the same class object, so the new multiple-concrete-classes error fired falsely on every aliased template. Fix: deduplicate by `id(obj)` BEFORE counting, so the same class object under multiple bindings counts once. New regression test test_aliased_concrete_class_is_deduplicated pins this against any future filter regression. Existing tests updated to use fully-concrete BaseAdapter subclasses (matching production templates) since the new abstract-filter correctly rejects partial stubs that don't override every abstract method BaseAdapter declares (5 methods: name, display_name, description, setup, create_executor). Q5 — GITHUB_TOKEN scope lockdown. validate-workspace-template.yml runs untrusted-by-design code from the calling template repo: pip post-install hooks, adapter.py imports, Dockerfile RUN steps. Each of those primitives executes with GITHUB_TOKEN in env. The workflow had no `permissions:` block, defaulting to whatever the calling repo grants — often contents: write. Add `permissions: contents: read` at the workflow level. Worst- case-with-token now drops to "read public repo state" — no write to issues, no push to branches, no comment-spam, no workflow re-trigger. Partial mitigation; the deeper `pull_request_target` discipline is bigger scope (tracked separately). Verification: - 47/47 tests pass (was 43; +3 abstract/multi-concrete + +1 alias) - All 8 production templates pass the full updated validator end-to-end with 0 warnings / 0 errors Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f125d68910
|
fix(validator): address post-merge review findings on #17 + #18 (#19)
Independent code review of #17 (adapter runtime-load) and #18 (schema versioning) surfaced four Required and three Optional findings worth fixing before the patterns harden into the codebase. Required: R1: Delete .molecule-ci/scripts/{validate-workspace-template, migrate-template}.py — dead-vendored mirror. The new validator workflow invokes .molecule-ci-canonical/scripts/ (the canonical clone), not .molecule-ci/scripts/. The mirror was the exact drift class #90 is supposed to eliminate: next contributor would edit one copy and silently diverge. Other workflows (validate-plugin, validate-org-template) still use the legacy path and keep their own scripts there — so removing OUR two files is asymmetric but correct, and the legacy path can phase out organically. R2: validate-workspace-template.yml's `cache-dependency-path` pointed at the validator's own deps file (just `pyyaml>=6.0`). Pip cache key never invalidated when the template added crewai/langgraph/ etc. Repoint to the calling repo's `requirements.txt`, which is the file the heavy install actually uses one step later. R3: `_check_schema_v1` looped `SCHEMA_V1_REQUIRED_KEYS` and re-emitted "missing required key `template_schema_version`" — but the dispatcher already verified the field is present + int before reaching v1, so that branch was dead defensive code. Skip it explicitly with a comment, but keep the field in the constant for contract documentation + the unknown-keys filter. R4: `_template_adapter_under_validation` was a fixed sys.modules key, meaning back-to-back invocations in the same Python process shared the slot. Use a per-call-unique name keyed on the absolute path's hash. No observed bug today; defensive-only. Optional: O1: Class-discovery filter now also requires `__module__ == module_name`. Without this, an `from molecule_runtime.adapters.base import AbstractCLIAdapter` re-export would count as a "real" adapter, masking the genuine "no concrete subclass" case the gate exists to catch. Cheap and forward-proofs against any future abstract intermediate the runtime might expose. Added a sibling test pinning the new behavior. O2: migrate-template.py's docstring claimed "uses ruamel.yaml when available" but the implementation only ever calls `yaml.safe_dump`. Replaced the lie with a clearer caveat block + a forward-pointer to ruamel-when-comments-detected as a future enhancement. O3: Reordered the workflow so the secret-scan step runs BEFORE `pip install -r requirements.txt`. Same threat surface as the Docker build smoke (which already runs first), but cheap defense- in-depth: a malicious template PR adding a malicious dep to requirements.txt now has its post-install hook execute AFTER the secret scanner has already inspected the diff. Test changes: - test_adapter_with_no_baseadapter_subclass_errors updated for the new error message ("no concrete class inheriting from"). - New test_only_imported_baseadapter_subclass_does_not_count pins the O1 __module__-filter behavior. - 43/43 tests pass (was 42/42 before the new test). - Real langgraph template still passes the full validator end-to-end. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
84a104a146
|
feat(validator): schema-version dispatch + migrate-template.py framework (#18)
Closes the schema-versioning workstream of #90. Sets up the machinery for "we will be updating a lot" (the user's framing) without forcing the first real schema bump to discover semantics under deadline pressure. Today every template is at v1; this PR adds the framework, ships zero behavior change for v1 templates, and reserves v2+ for when there's a concrete reason to bump. Validator changes: - `KNOWN_SCHEMA_VERSIONS = {1}` — the set the validator currently accepts. Future bumps add to this set. - `DEPRECATED_SCHEMA_VERSIONS: set[int] = set()` — versions accepted with warning during a deprecation window. - Per-version contract: `_check_schema_v1(config)` enforces the v1 REQUIRED_KEYS / OPTIONAL_KEYS / KNOWN_RUNTIMES contract — exactly what the previous monolithic check_config_yaml did. - Dispatch table: `SCHEMA_CHECKS = {1: _check_schema_v1}`. Versions that aren't in the table hard-error. - check_config_yaml() now: reads template_schema_version → emits deprecation warning if applicable → dispatches to the right SCHEMA_CHECKS entry → unknown versions hard-error with actionable instructions ("add a SCHEMA_V<N> block"). - Schema versions are FROZEN once shipped: never edit a SCHEMA_V<N> constant in place. To bump, ADD v<N+1> alongside, deprecate v<N>, migrate consumers, drop v<N> next cycle. Header comment documents the discipline. New script `migrate-template.py`: - `MIGRATIONS: dict[int, Callable[[dict], dict]]` registry — each entry maps a SOURCE version to the function that produces the next version's dict. Empty today. - `migrate_config(config, from, to)` chains migrations sequentially. Forward-only (errors on backward), errors on missing intermediate steps (never silently skip), asserts every migration stamps its output's template_schema_version. - CLI: `migrate-template.py [--from N] [--to M] [--dry-run] DIR`. Defaults: --from = whatever config.yaml declares, --to = highest reachable from MIGRATIONS (currently 1, so a no-op). Behavior change to the existing test_missing_required_keys_errors test: Previously the validator emitted 3 "missing required key" errors when name/runtime/template_schema_version were all missing. Now it short-circuits on missing version with a single actionable error — listing downstream missing keys is noise on top of the real problem (no version means we can't pick a contract). The test was updated to pin the new behavior; a new sibling test (test_missing_required_keys_under_v1_dispatch_errors) pins that v1 still lists name/runtime/etc. when present-with-v1. Verification: - 42/42 tests pass (20 prior + 9 new schema-dispatch tests in test_validate_workspace_template.py + 17 new migrator tests in test_migrate_template.py). - Real langgraph template runs through the full updated validator end-to-end with 0 warnings / 0 errors. This + #17 means #90 is done end-to-end: - Phase 2: validator green on all 8 templates as a required check (already shipped) - Phase 2.5: adapter.py runtime-load contract (#17) - Phase 3: schema versioning + migration framework (this PR) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8309a55e6c
|
feat(validator): runtime-load check for adapter.py contract (#17)
Adds the third workstream of #90 (eliminate template repo drift): a strong contract check that exercises adapter.py the same way the runtime does at workspace boot. Without this, a template can have a syntactically-valid Dockerfile + an adapter.py that ImportErrors at runtime, build clean through Docker smoke, and crash on first user prompt — exactly the human-error class #90 is meant to eliminate. Existing checks ranked from weakest to strongest: 1. check_adapter() — text-grep for legacy `molecule_ai` imports. Catches one specific footgun. 2. Docker build smoke — `docker build` succeeds. Doesn't RUN the image, so adapter.py is never imported. Misses every adapter-load bug. 3. (NEW) check_adapter_runtime_load — imports adapter.py via the same `importlib.spec_from_file_location` path the runtime uses, and asserts at least one class inherits from molecule_runtime.adapters.base.BaseAdapter. Hard-error conditions: - adapter.py raises any exception during import (SyntaxError, ImportError, NameError, etc.). Same exception would crash the workspace at boot. - No class in the module inherits from BaseAdapter. The runtime's class-discovery silently falls through to the default langgraph executor in this case — exactly the silent-failure shape the contract is meant to catch. Skip conditions: - No adapter.py exists. Templates without one inherit the default executor by design (policy, not drift). - molecule-ai-workspace-runtime not importable in the validator env. Warns loudly so the CI-config bug surfaces, but doesn't hard-fail (we'd be reporting "your adapter is broken" when the actual cause is missing infra). Workflow update: validate-workspace-template.yml now installs the template's requirements.txt before invoking the validator (or falls back to installing molecule-ai-workspace-runtime alone if the template has no requirements.txt). This satisfies the runtime-load check's import dependencies the same way the workspace container does at boot — `pip install -r requirements.txt`. Verified locally: - 20/20 tests in test_validate_workspace_template.py pass (14 existing + 6 new). - Real langgraph template passes the full new validator including runtime-load (0 warnings, 0 errors). - Surveyed all 8 production templates' adapter.py shapes; every one already inherits from BaseAdapter, so this check turns green on first run with zero per-template fixups needed. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
73102cdaa9 |
feat(validate-workspace-template): strict drift gate + canonical-fetch workflow
P6 Phase 1: enforce the workspace-template contract via CI on every
template-repo push, eliminating the slow drift that produced 8
copies of a 28-line Dockerfile in different states of decay.
The previous validator (50 lines, soft warnings only) couldn't
catch the cache-trap pattern (Dockerfile missing ARG RUNTIME_VERSION)
that silently shipped the previous runtime wheel during cascade
publishes — observed five times in a row on 2026-04-27. Hardened
into structural checks that fail CI, not just warn:
- Dockerfile must base on python:3.11-slim
- Dockerfile must declare ARG RUNTIME_VERSION AND reference
${RUNTIME_VERSION} in a RUN block (the arg has to be in the
layer's command line for docker to hash it into the cache key)
- Dockerfile must create the agent uid-1000 user (Claude Code
refuses --dangerously-skip-permissions as root for safety)
- Dockerfile must end at molecule-runtime — directly via
ENTRYPOINT or via a wrapper script that exec's it (claude-code
has entrypoint.sh for gosu drop-priv; hermes has start.sh to
boot the hermes-agent daemon first; both are allowed)
- config.yaml must have name + runtime + integer
template_schema_version. Quoted "1" fails — observed previously
in a copy-pasted template that the YAML loader turned into str
- requirements.txt must declare molecule-ai-workspace-runtime
Also fixed: the original validator's warning telling adapter.py
NOT to import molecule_runtime was backwards — that's the
canonical package name post-#87. Now it warns on the legacy
molecule_ai prefix instead.
Reusable workflow change: instead of running
.molecule-ci/scripts/validate-workspace-template.py (a per-template
vendored copy that drifts as the validator evolves), the workflow
now checks out molecule-ci itself into .molecule-ci-canonical and
runs the canonical script from there. Single source of truth —
every template runs the SAME contract on every CI run. The legacy
.molecule-ci/scripts/ directories in each template repo can be
deleted in a Phase 2 cleanup PR.
14 unit tests pin the contract:
- canonical template passes
- claude-code-style custom entrypoint passes when the wrapper
exec's molecule-runtime
- 5 Dockerfile drift modes each error individually
- 3 config.yaml drift modes each error/warn
- requirements.txt missing-runtime errors
- legacy molecule_ai import warns
- regression cover: modern molecule_runtime import does NOT
trigger the (deleted) backwards warning
All 8 production template repos pass the new contract today —
this PR locks in the current good state, it does not force any
template-repo edits.
Contract documented at docs/template-contract.md so the rules are
discoverable without reading the validator.
|
||
|
|
a89b14a76c |
ci: streamline workflows, add timeouts, and cache pip
- Remove redundant nested checkout of molecule-ci in workflow_call jobs - Add timeout-minutes to prevent hung jobs (plugin: 10m, workspace: 15m) - Add pip cache using requirements.txt - Add missing SKILL.md heading check in validate-plugin - Add legacy import and runtime dependency warnings in workspace validation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
6533e6eeac |
fix: use standalone Python scripts instead of heredocs in workflows
Heredocs in GitHub Actions YAML were being echoed as script text instead of executed. Moving validation logic to scripts/ and running via 'python3 .molecule-ci/scripts/validate-*.py' after checking out the molecule-ci repo at .molecule-ci/ path. |