molecule-ci/scripts/migrate-template.py
Hongming Wang 84a104a146
feat(validator): schema-version dispatch + migrate-template.py framework (#18)
Closes the schema-versioning workstream of #90. Sets up the machinery
for "we will be updating a lot" (the user's framing) without forcing
the first real schema bump to discover semantics under deadline
pressure. Today every template is at v1; this PR adds the framework,
ships zero behavior change for v1 templates, and reserves v2+ for
when there's a concrete reason to bump.

Validator changes:

  - `KNOWN_SCHEMA_VERSIONS = {1}` — the set the validator currently
    accepts. Future bumps add to this set.
  - `DEPRECATED_SCHEMA_VERSIONS: set[int] = set()` — versions accepted
    with warning during a deprecation window.
  - Per-version contract: `_check_schema_v1(config)` enforces the v1
    REQUIRED_KEYS / OPTIONAL_KEYS / KNOWN_RUNTIMES contract — exactly
    what the previous monolithic check_config_yaml did.
  - Dispatch table: `SCHEMA_CHECKS = {1: _check_schema_v1}`. Versions
    that aren't in the table hard-error.

  - check_config_yaml() now: reads template_schema_version → emits
    deprecation warning if applicable → dispatches to the right
    SCHEMA_CHECKS entry → unknown versions hard-error with actionable
    instructions ("add a SCHEMA_V<N> block").

  - Schema versions are FROZEN once shipped: never edit a SCHEMA_V<N>
    constant in place. To bump, ADD v<N+1> alongside, deprecate v<N>,
    migrate consumers, drop v<N> next cycle. Header comment documents
    the discipline.

New script `migrate-template.py`:

  - `MIGRATIONS: dict[int, Callable[[dict], dict]]` registry — each
    entry maps a SOURCE version to the function that produces the
    next version's dict. Empty today.
  - `migrate_config(config, from, to)` chains migrations sequentially.
    Forward-only (errors on backward), errors on missing intermediate
    steps (never silently skip), asserts every migration stamps its
    output's template_schema_version.
  - CLI: `migrate-template.py [--from N] [--to M] [--dry-run] DIR`.
    Defaults: --from = whatever config.yaml declares, --to = highest
    reachable from MIGRATIONS (currently 1, so a no-op).

Behavior change to the existing
test_missing_required_keys_errors test:

  Previously the validator emitted 3 "missing required key" errors
  when name/runtime/template_schema_version were all missing. Now it
  short-circuits on missing version with a single actionable error —
  listing downstream missing keys is noise on top of the real
  problem (no version means we can't pick a contract). The test was
  updated to pin the new behavior; a new sibling test
  (test_missing_required_keys_under_v1_dispatch_errors) pins that v1
  still lists name/runtime/etc. when present-with-v1.

Verification:

  - 42/42 tests pass (20 prior + 9 new schema-dispatch tests in
    test_validate_workspace_template.py + 17 new migrator tests in
    test_migrate_template.py).
  - Real langgraph template runs through the full updated validator
    end-to-end with 0 warnings / 0 errors.

This + #17 means #90 is done end-to-end:
  - Phase 2: validator green on all 8 templates as a required check (already shipped)
  - Phase 2.5: adapter.py runtime-load contract (#17)
  - Phase 3: schema versioning + migration framework (this PR)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 12:07:04 -07:00

213 lines
7.7 KiB
Python
Executable File

#!/usr/bin/env python3
"""Migrate a workspace template's config.yaml across schema versions.
Companion to validate-workspace-template.py. Whenever the validator
adds a new schema version, this script gets a corresponding entry in
MIGRATIONS so each consumer template can mechanically upgrade rather
than every maintainer figuring out the field changes by hand.
Discipline (matches the validator's header):
1. Validator gets a SCHEMA_V<N+1> block + KNOWN_SCHEMA_VERSIONS bump.
2. This script gets `MIGRATIONS[N]` defined — a function that takes
a v<N> dict and returns a v<N+1> dict. Pure, deterministic, no
I/O — that way migrations compose: v1 → v2 → v3 just chains them.
3. Each migration is FROZEN once shipped. If a v2 migration needs
fixing post-ship, ship it as v3 with the corrective migration.
4. Consumers run this script (one PR per template repo) before the
deprecation window for v<N> closes.
Usage:
# Migrate the template in cwd from its current version to the latest
python3 scripts/migrate-template.py .
# Migrate to a specific version (bounded; useful when a deprecation
# window is closing and you want to skip-ahead)
python3 scripts/migrate-template.py --to 3 .
# Force the source version (override config.yaml's declared version)
python3 scripts/migrate-template.py --from 1 --to 2 .
# Dry-run: print the diff without writing
python3 scripts/migrate-template.py --dry-run .
The script preserves YAML round-trip fidelity for keys it doesn't
touch (using ruamel.yaml when available; falling back to PyYAML's
default representer otherwise). Migrations should ONLY mutate keys
they're explicitly versioning — leave everything else alone so a
consumer template's customizations survive.
"""
from __future__ import annotations
import argparse
import sys
from copy import deepcopy
from pathlib import Path
from typing import Callable
import yaml
# ──────────────────────────────────────────── migrations registry
# Each entry maps a SOURCE version to the function that produces the
# next version's dict. Currently empty — no v2 yet. The first time a
# real schema bump lands, MIGRATIONS[1] gets defined alongside the
# validator's SCHEMA_V2 block.
MIGRATIONS: dict[int, Callable[[dict], dict]] = {}
# ──────────────────────────────────────────── version detection
def _detect_current_version(config: dict) -> int:
sv = config.get("template_schema_version")
if sv is None:
sys.exit(
"error: config.yaml has no `template_schema_version`. "
"Add it (likely 1 for legacy templates) before migrating."
)
if not isinstance(sv, int):
sys.exit(
f"error: template_schema_version must be int, got "
f"{type(sv).__name__}={sv!r}."
)
return sv
def _latest_known_version() -> int:
"""Maximum version reachable by chaining MIGRATIONS from any
starting point. With an empty registry, this is 1 (the floor:
every existing template is at v1)."""
if not MIGRATIONS:
return 1
return max(MIGRATIONS.keys()) + 1
# ──────────────────────────────────────────── core
def migrate_config(config: dict, from_version: int, to_version: int) -> dict:
"""Apply migrations sequentially from `from_version` to `to_version`.
Returns a NEW dict — does not mutate the input.
Errors loudly when there's no migration registered for an
intermediate step: forward-only, never silently skip a hop. If the
user asks for a backward migration, error too — schema versions
are append-only and we don't ship downgrades."""
if to_version < from_version:
sys.exit(
f"error: cannot migrate backward (from v{from_version} to "
f"v{to_version}). Schema versions are append-only — file a "
f"new bug + ship a forward migration instead."
)
current = from_version
out = deepcopy(config)
while current < to_version:
step = MIGRATIONS.get(current)
if step is None:
sys.exit(
f"error: no migration registered for v{current}"
f"v{current + 1}. Either add it to MIGRATIONS in "
f"scripts/migrate-template.py or pick a different --to."
)
out = step(out)
# Every migration MUST stamp the new version on its output —
# this assertion catches a class of bugs where a migration
# forgets to bump template_schema_version.
if out.get("template_schema_version") != current + 1:
sys.exit(
f"error: MIGRATIONS[{current}] did not stamp "
f"template_schema_version={current + 1} on its output. "
f"This is a bug in the migration function itself."
)
current += 1
return out
def _read_yaml(path: Path) -> dict:
with open(path) as f:
data = yaml.safe_load(f)
if not isinstance(data, dict):
sys.exit(f"error: {path} root is not a mapping (got {type(data).__name__})")
return data
def _write_yaml(path: Path, data: dict) -> None:
# Sort keys for stable diffs across migrations. This matches what
# `yaml.safe_dump` does when we write — consumer repos with
# custom orderings will see their config.yaml re-ordered, which is
# one of those round-trip lossy tradeoffs that's worth accepting:
# the migration moment is rare and the diff is reviewable.
with open(path, "w") as f:
yaml.safe_dump(data, f, sort_keys=True, default_flow_style=False)
# ──────────────────────────────────────────── CLI
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description="Migrate a workspace template's config.yaml across schema versions."
)
parser.add_argument(
"template_dir",
type=Path,
help="Path to the template repo root (must contain config.yaml).",
)
parser.add_argument(
"--from",
dest="from_version",
type=int,
default=None,
help="Source schema version (defaults to whatever config.yaml declares).",
)
parser.add_argument(
"--to",
dest="to_version",
type=int,
default=None,
help="Target schema version (defaults to the highest reachable from MIGRATIONS).",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Print the migrated YAML to stdout without modifying the file.",
)
args = parser.parse_args(argv)
config_path = args.template_dir / "config.yaml"
if not config_path.is_file():
sys.exit(f"error: {config_path} does not exist")
config = _read_yaml(config_path)
from_version = args.from_version
if from_version is None:
from_version = _detect_current_version(config)
to_version = args.to_version
if to_version is None:
to_version = _latest_known_version()
if from_version == to_version:
print(
f"nothing to do: config.yaml is already at v{from_version}.",
file=sys.stderr,
)
return 0
migrated = migrate_config(config, from_version, to_version)
if args.dry_run:
yaml.safe_dump(migrated, sys.stdout, sort_keys=True, default_flow_style=False)
return 0
_write_yaml(config_path, migrated)
print(
f"✓ migrated {config_path} from v{from_version} → v{to_version}",
file=sys.stderr,
)
return 0
if __name__ == "__main__":
sys.exit(main())