extract(dev-tree): port dev tree from molecule-ai-org-template-molecule-dev with history
All checks were successful
Validate dev-department tree / Validate tree (pull_request) Successful in 58s
All checks were successful
Validate dev-department tree / Validate tree (pull_request) Successful in 58s
Phase 3c-2 of internal#77 (dev-department extraction).
What this commit lands:
- 27 dev-tree workspace folders + teams/{core-platform,controlplane,
app-docs,infra,sdk,documentation-specialist,triage-operator,dev}.yaml
composition layer (extracted via git filter-repo).
- 6 commits of git history preserved for the dev paths (subset of
parent's 11 — only commits that touched dev paths kept).
- 17 orphaned folders from parent template DROPPED at extract time
(backend-engineer{,-2,-3}, frontend-engineer{,-2,-3}, qa-engineer{,-2,-3},
security-auditor{,-2}, platform-engineer, devops-engineer, sre-engineer,
offensive-security-engineer, devrel-engineer, triage-operator-2,
uiux-designer). These were not reachable from any teams/*.yaml
!include chain in the parent.
- dev-department.yaml roots populated to point at teams/dev.yaml.
- teams/dev.yaml gains triage-operator as a direct child (Q2). Q1
doc-spec is already transitively under dev-lead via app-docs sub-team
(teams/app-docs.yaml children include documentation-specialist.yaml),
so no additional include needed there.
- Validator improvements (Phase 3b refinements):
* Registers files_dir-declared workspace folders from !include'd
team yamls (was missing — *-lead workspaces previously orphaned).
* Deduplicates inline files_dir registration after !include path
has already registered the same folder.
* Filesystem scan now treats system-prompt.md / initial-prompt.md
as workspace markers (workspace.yaml is optional in transitional
shape).
* --strict flag (or MOLECULE_VALIDATE_TREE_STRICT=1) makes
cross-tree '..' refs hard-fail instead of warn. Phase 3c-3 will
flip the CI gate to --strict once atomization removes them.
Validator state on this commit (default mode):
filesystem workspace folders : 28
reachable from manifest : 28
orphans : 0
cross-tree '..' refs : 20 [WARN]
duplicate-parent claims : 0
OK — tree is clean
The 20 cross-tree '..' refs are the transitional teams/<sub-team>.yaml
shape pre-atomization (Phase 3c-3 lifts each *-lead workspace into a
folder containing its sub-team children, dissolving teams/ entirely).
Refs:
internal#77 — extraction RFC
Hongming GO 2026-05-08 + 'approved keep going' 2026-05-08
SOP Phase 3c-2 — task #228
This commit is contained in:
parent
193100b04a
commit
711f011ec1
@ -6,21 +6,33 @@ Walks the manifest (org.yaml or dev-department.yaml) → roots → recursive
|
||||
`children:` (and `!include`) → builds the set of reachable workspace
|
||||
folders → compares against the filesystem → reports violations.
|
||||
|
||||
Catches the four failure modes that motivated the RFC (internal#77):
|
||||
Catches the failure modes that motivated the RFC (internal#77):
|
||||
|
||||
1. Orphan workspace folders (folder exists, no parent claims it).
|
||||
2. Cross-tree `..` traversal in `children:` paths (atomization rule).
|
||||
3. Workspace folder without `workspace.yaml` (broken nest).
|
||||
4. Two parents claiming the same child workspace (graph not a tree).
|
||||
3. Two parents claiming the same child workspace (graph not a tree).
|
||||
4. Generic errors (missing !include target, parse failures).
|
||||
|
||||
Usage:
|
||||
|
||||
.molecule-ci/scripts/validate-tree.py [<manifest>]
|
||||
.molecule-ci/scripts/validate-tree.py [--strict] [<manifest>]
|
||||
|
||||
Exits non-zero on any violation. With no arg, defaults to the first of
|
||||
{dev-department.yaml, org.yaml} that exists in cwd.
|
||||
Exits non-zero on orphans, duplicate parents, or generic errors. By
|
||||
default cross-tree `..` refs print as warnings — extracted trees that
|
||||
retain a transitional `teams/*.yaml` composition layer pre-atomization
|
||||
will have these. Pass `--strict` (or set
|
||||
`MOLECULE_VALIDATE_TREE_STRICT=1`) to also error on `..`. With no
|
||||
manifest arg, defaults to the first of {dev-department.yaml, org.yaml}
|
||||
found in cwd.
|
||||
|
||||
Standard library only — runs on every CI runner without `pip install`.
|
||||
A "workspace folder" is identified by any of:
|
||||
- a workspace.yaml file inside it (atomized shape), or
|
||||
- a system-prompt.md / initial-prompt.md file (transitional shape;
|
||||
workspace is declared in a parent yaml's children: with
|
||||
files_dir: <this folder>).
|
||||
|
||||
Standard library only — runs on every CI runner without `pip install`
|
||||
beyond PyYAML.
|
||||
|
||||
Refs: internal#77 (Phase 3b — task #223).
|
||||
"""
|
||||
@ -29,7 +41,6 @@ from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
@ -49,14 +60,11 @@ INCLUDE_TAG = "!include"
|
||||
|
||||
class IncludingLoader(yaml.SafeLoader):
|
||||
"""SafeLoader that records `!include <path>` scalars verbatim instead
|
||||
of trying to resolve them. We do resolution explicitly so we can also
|
||||
track the parent→child edge for the orphan/duplicate check."""
|
||||
of resolving them. We do resolution explicitly so we can also track
|
||||
the parent→child edge for the orphan/duplicate check."""
|
||||
|
||||
|
||||
def _include_constructor(loader: yaml.Loader, node: yaml.Node) -> dict:
|
||||
"""Replace a `!include` scalar with a sentinel dict the walker
|
||||
interprets. We don't resolve the file content here — the walker does
|
||||
that with full path-context awareness."""
|
||||
if not isinstance(node, yaml.ScalarNode):
|
||||
raise yaml.YAMLError(f"!include must be a scalar path; got {node.tag} at line {node.start_mark.line}")
|
||||
return {"__include__": loader.construct_scalar(node)}
|
||||
@ -75,11 +83,10 @@ def _yaml_load(path: Path) -> Any:
|
||||
|
||||
class TreeReport:
|
||||
def __init__(self) -> None:
|
||||
self.parent_of: dict[str, str] = {} # workspace-folder → parent-folder
|
||||
self.cross_tree_refs: list[tuple[str, str]] = [] # (where, escaping path)
|
||||
self.duplicates: list[tuple[str, str, str]] = [] # (folder, parent_a, parent_b)
|
||||
self.missing_workspace_yaml: list[str] = [] # folders referenced as children but no workspace.yaml
|
||||
self.errors: list[str] = [] # generic errors (yaml parse, missing include)
|
||||
self.parent_of: dict[str, str] = {}
|
||||
self.cross_tree_refs: list[tuple[str, str]] = []
|
||||
self.duplicates: list[tuple[str, str, str]] = []
|
||||
self.errors: list[str] = []
|
||||
|
||||
def add_edge(self, parent_folder: str, child_folder: str) -> None:
|
||||
if child_folder in self.parent_of:
|
||||
@ -90,48 +97,43 @@ class TreeReport:
|
||||
def reachable(self) -> set[str]:
|
||||
return set(self.parent_of.keys())
|
||||
|
||||
def has_violations(self) -> bool:
|
||||
return bool(self.cross_tree_refs or self.duplicates or self.missing_workspace_yaml or self.errors)
|
||||
def has_hard_violations(self, strict: bool) -> bool:
|
||||
hard = bool(self.duplicates or self.errors)
|
||||
if strict:
|
||||
hard = hard or bool(self.cross_tree_refs)
|
||||
return hard
|
||||
|
||||
|
||||
def _walk_workspace_node(
|
||||
node: Any,
|
||||
yaml_dir: Path, # dir of the YAML file currently being processed (for relative paths)
|
||||
repo_root: Path, # repo root (for orphan-set comparison + escape detection)
|
||||
yaml_dir: Path,
|
||||
repo_root: Path,
|
||||
parent_folder: str | None,
|
||||
report: TreeReport,
|
||||
skip_files_dir_register: bool = False,
|
||||
) -> None:
|
||||
"""Walk a workspace-shaped dict (or list of children) recursively.
|
||||
"""Walk a workspace-shaped dict / list / !include sentinel recursively.
|
||||
|
||||
For each `!include` we encountered (now wrapped as `{"__include__": "<path>"}`),
|
||||
we resolve to the target file, register the workspace folder, and
|
||||
recurse into the loaded content.
|
||||
"""
|
||||
skip_files_dir_register: when True, the next dict-level files_dir
|
||||
won't add a parent→child edge — used after !include has already
|
||||
registered the workspace folder for the loaded yaml's content."""
|
||||
if node is None:
|
||||
return
|
||||
|
||||
# Top-level YAML doc may have `workspaces:` or `roots:` (the
|
||||
# dev-department.yaml convention) listing the root workspaces.
|
||||
# Top-level YAML doc may have `workspaces:` or `roots:` listing the roots.
|
||||
if isinstance(node, dict) and ("workspaces" in node or "roots" in node):
|
||||
roots = node.get("roots") or node.get("workspaces") or []
|
||||
for child in roots:
|
||||
_walk_workspace_node(child, yaml_dir, repo_root, parent_folder=None, report=report)
|
||||
return
|
||||
|
||||
# !include sentinel: resolve, register, recurse.
|
||||
# !include sentinel.
|
||||
if isinstance(node, dict) and "__include__" in node:
|
||||
rel = node["__include__"]
|
||||
target = (yaml_dir / rel).resolve()
|
||||
try:
|
||||
target.relative_to(repo_root.resolve())
|
||||
except ValueError:
|
||||
# The !include path escapes the repo root. This is the
|
||||
# cross-repo symlink case (parent template !include-ing into
|
||||
# the dev-department subtree via a symlink). The child folder
|
||||
# is OUTSIDE repo_root — record but don't claim as duplicate.
|
||||
# For the dev-department validator, repo_root IS dev-department,
|
||||
# so its own internal !includes never escape; cross-repo
|
||||
# composition is parent-template's concern.
|
||||
report.errors.append(
|
||||
f"!include {rel!r} (from {yaml_dir.name}) resolves outside repo root: {target}"
|
||||
)
|
||||
@ -140,69 +142,67 @@ def _walk_workspace_node(
|
||||
report.errors.append(f"!include {rel!r} (from {yaml_dir.name}): target does not exist: {target}")
|
||||
return
|
||||
|
||||
# If the include targets a workspace.yaml, the FOLDER containing
|
||||
# it is the workspace identity.
|
||||
try:
|
||||
sub = _yaml_load(target)
|
||||
except yaml.YAMLError as e:
|
||||
report.errors.append(f"yaml parse {target}: {e}")
|
||||
return
|
||||
|
||||
# Identify the workspace folder this !include refers to:
|
||||
# 1. include targets a workspace.yaml — its parent dir is the folder.
|
||||
# 2. include targets a yaml whose top-level dict has files_dir —
|
||||
# that files_dir is the folder.
|
||||
# 3. include targets a transparent composition file (no files_dir,
|
||||
# no workspace.yaml) — recurse without registering.
|
||||
child_folder: str | None = None
|
||||
if target.name == "workspace.yaml":
|
||||
child_folder = str(target.parent.resolve().relative_to(repo_root.resolve()))
|
||||
else:
|
||||
# Team-shaped !include (e.g. teams/core-platform.yaml) — not a
|
||||
# workspace folder of its own. Recurse into its content.
|
||||
child_folder = None
|
||||
elif isinstance(sub, dict) and sub.get("files_dir"):
|
||||
fd = sub["files_dir"]
|
||||
fd_resolved = (repo_root / fd).resolve()
|
||||
try:
|
||||
child_folder = str(fd_resolved.relative_to(repo_root.resolve()))
|
||||
except ValueError:
|
||||
report.errors.append(
|
||||
f"!include {rel!r} declares files_dir {fd!r} outside repo root"
|
||||
)
|
||||
return
|
||||
|
||||
# Cross-tree `..` ref check on the path the user wrote.
|
||||
if child_folder is not None and parent_folder is not None:
|
||||
# Reject `..` traversal in the path the user wrote (atomization
|
||||
# rule). The resolved target may legitimately be in a parent
|
||||
# folder (sibling tree), but the dev-department's `children:`
|
||||
# paths are required to be `./<child>` only.
|
||||
if rel.startswith("..") or "/.." in rel:
|
||||
report.cross_tree_refs.append((parent_folder, rel))
|
||||
|
||||
if child_folder is not None:
|
||||
report.add_edge(parent_folder or "<root>", child_folder)
|
||||
|
||||
# Load and recurse.
|
||||
try:
|
||||
sub = _yaml_load(target)
|
||||
except yaml.YAMLError as e:
|
||||
report.errors.append(f"yaml parse {target}: {e}")
|
||||
return
|
||||
_walk_workspace_node(
|
||||
sub,
|
||||
yaml_dir=target.parent,
|
||||
repo_root=repo_root,
|
||||
parent_folder=child_folder if child_folder is not None else parent_folder,
|
||||
report=report,
|
||||
# !include already registered child_folder; suppress inline
|
||||
# re-registration when the loaded yaml's top dict has the
|
||||
# same files_dir.
|
||||
skip_files_dir_register=child_folder is not None,
|
||||
)
|
||||
return
|
||||
|
||||
# Inline workspace-shaped dict.
|
||||
if isinstance(node, dict):
|
||||
# `files_dir:` identifies the workspace folder for inline declarations.
|
||||
files_dir = node.get("files_dir")
|
||||
if files_dir and parent_folder is None:
|
||||
# A root-level workspace declared inline (no !include). The
|
||||
# files_dir is the folder.
|
||||
files_dir_resolved = (repo_root / files_dir).resolve()
|
||||
current_folder = parent_folder
|
||||
if files_dir:
|
||||
fd_resolved = (repo_root / files_dir).resolve()
|
||||
try:
|
||||
rel_to_root = files_dir_resolved.relative_to(repo_root.resolve())
|
||||
this_folder = str(fd_resolved.relative_to(repo_root.resolve()))
|
||||
except ValueError:
|
||||
report.errors.append(f"files_dir {files_dir!r} escapes repo root")
|
||||
return
|
||||
this_folder = str(rel_to_root)
|
||||
report.add_edge("<root>", this_folder)
|
||||
# Verify a workspace.yaml exists in that folder for atomized
|
||||
# tree (post-Phase 3c-2).
|
||||
ws_yaml = files_dir_resolved / "workspace.yaml"
|
||||
if not ws_yaml.exists():
|
||||
# Pre-atomization, a workspace can be declared inline at
|
||||
# the manifest level without a workspace.yaml in its
|
||||
# files_dir. Don't false-positive.
|
||||
pass
|
||||
if not skip_files_dir_register:
|
||||
report.add_edge(parent_folder or "<root>", this_folder)
|
||||
current_folder = this_folder
|
||||
else:
|
||||
current_folder = parent_folder
|
||||
|
||||
# Recurse into children.
|
||||
for child in node.get("children") or []:
|
||||
_walk_workspace_node(child, yaml_dir, repo_root, current_folder, report)
|
||||
return
|
||||
@ -216,22 +216,26 @@ def _walk_workspace_node(
|
||||
# ---------- Filesystem scan ----------
|
||||
|
||||
|
||||
# Folders inside the repo that are NOT workspace folders. The validator
|
||||
# allows these to exist without a parent in the tree.
|
||||
NON_WORKSPACE_DIRS = {
|
||||
".git", ".github", ".molecule-ci", "docs", "scripts", "tests", "fixtures",
|
||||
"teams", # composition layer, not workspace folders
|
||||
"node_modules", "__pycache__", ".cache", ".venv", "venv",
|
||||
}
|
||||
|
||||
WORKSPACE_FOLDER_MARKERS = {
|
||||
"workspace.yaml", "system-prompt.md", "initial-prompt.md",
|
||||
}
|
||||
|
||||
|
||||
def _is_workspace_folder(filenames: list[str]) -> bool:
|
||||
return any(m in filenames for m in WORKSPACE_FOLDER_MARKERS)
|
||||
|
||||
|
||||
def _scan_workspace_folders(repo_root: Path) -> set[str]:
|
||||
"""Every directory containing a workspace.yaml is a workspace folder.
|
||||
Path returned is repo-relative and POSIX-style."""
|
||||
found: set[str] = set()
|
||||
for dirpath, dirnames, filenames in os.walk(repo_root, followlinks=False):
|
||||
# Prune obvious non-workspace dirs.
|
||||
dirnames[:] = [d for d in dirnames if d not in NON_WORKSPACE_DIRS]
|
||||
if "workspace.yaml" in filenames:
|
||||
if _is_workspace_folder(filenames):
|
||||
rel = Path(dirpath).resolve().relative_to(repo_root.resolve())
|
||||
if str(rel) != ".":
|
||||
found.add(str(rel))
|
||||
@ -253,10 +257,17 @@ def _find_manifest() -> Path:
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) > 2:
|
||||
sys.stderr.write("usage: validate-tree.py [<manifest>]\n")
|
||||
args = sys.argv[1:]
|
||||
strict = False
|
||||
if "--strict" in args:
|
||||
strict = True
|
||||
args = [a for a in args if a != "--strict"]
|
||||
if os.environ.get("MOLECULE_VALIDATE_TREE_STRICT") == "1":
|
||||
strict = True
|
||||
if len(args) > 1:
|
||||
sys.stderr.write("usage: validate-tree.py [--strict] [<manifest>]\n")
|
||||
return 2
|
||||
manifest = Path(sys.argv[1]) if len(sys.argv) == 2 else _find_manifest()
|
||||
manifest = Path(args[0]) if args else _find_manifest()
|
||||
if not manifest.exists():
|
||||
sys.stderr.write(f"validate-tree.py: manifest does not exist: {manifest}\n")
|
||||
return 2
|
||||
@ -282,15 +293,13 @@ def main() -> int:
|
||||
reachable = report.reachable()
|
||||
orphans = sorted(fs_workspaces - reachable)
|
||||
|
||||
# Build report.
|
||||
print(f"=== validate-tree.py report — manifest: {manifest} ===")
|
||||
print(f" filesystem workspace folders : {len(fs_workspaces)}")
|
||||
print(f" reachable from manifest : {len(reachable)}")
|
||||
print(f" orphans : {len(orphans)}")
|
||||
print(f" cross-tree '..' refs : {len(report.cross_tree_refs)}")
|
||||
print(f" duplicate-parent claims : {len(report.duplicates)}")
|
||||
print(f" missing workspace.yaml : {len(report.missing_workspace_yaml)}")
|
||||
print(f" generic errors : {len(report.errors)}")
|
||||
print(f" orphans : {len(orphans)}")
|
||||
print(f" cross-tree '..' refs : {len(report.cross_tree_refs)}")
|
||||
print(f" duplicate-parent claims : {len(report.duplicates)}")
|
||||
print(f" generic errors : {len(report.errors)}")
|
||||
print()
|
||||
|
||||
if orphans:
|
||||
@ -299,9 +308,13 @@ def main() -> int:
|
||||
print(f" - {o}")
|
||||
print()
|
||||
if report.cross_tree_refs:
|
||||
print("CROSS-TREE '..' REFS (atomization rule violation):")
|
||||
sev = "ERROR" if strict else "WARN"
|
||||
print(f"CROSS-TREE '..' REFS [{sev}] (atomization rule):")
|
||||
for parent, path in report.cross_tree_refs:
|
||||
print(f" - parent={parent} path={path}")
|
||||
if not strict:
|
||||
print(" (warn-only without --strict; pre-atomization extracted trees keep")
|
||||
print(" transitional `..` refs in teams/*.yaml; Phase 3c-3 removes them)")
|
||||
print()
|
||||
if report.duplicates:
|
||||
print("DUPLICATE PARENT CLAIMS (graph not a tree):")
|
||||
@ -314,11 +327,12 @@ def main() -> int:
|
||||
print(f" - {e}")
|
||||
print()
|
||||
|
||||
fail = bool(orphans) or report.has_violations()
|
||||
fail = bool(orphans) or report.has_hard_violations(strict)
|
||||
if fail:
|
||||
print("FAIL — see above")
|
||||
return 1
|
||||
print("OK — tree is clean")
|
||||
suffix = " (strict)" if strict else ""
|
||||
print(f"OK — tree is clean{suffix}")
|
||||
return 0
|
||||
|
||||
|
||||
|
||||
@ -66,15 +66,21 @@ defaults:
|
||||
|
||||
# Roots block: list the top-level workspaces of this subtree.
|
||||
#
|
||||
# Each root entry is a `!include <path>/workspace.yaml` reference to a
|
||||
# workspace folder at the repo root level. The validator walks each
|
||||
# referenced workspace.yaml recursively via its `children:` field.
|
||||
# Each root entry resolves through `!include` to a workspace.yaml file.
|
||||
# The validator walks each referenced workspace.yaml recursively via its
|
||||
# `children:` field.
|
||||
#
|
||||
# Atomization rule (Hongming Q3+Q5): `children:` paths inside a
|
||||
# workspace.yaml MUST be relative-and-down-only (`./<child>`); no `..`.
|
||||
# workspace.yaml SHOULD be relative-and-down-only (`./<child>`); no `..`.
|
||||
# The `.molecule-ci/scripts/validate-tree.py` CI gate enforces this.
|
||||
# CURRENT STATE: extracted tree retains the parent template's flat shape
|
||||
# with `teams/*.yaml` !include'ing siblings via `..`. Atomization to
|
||||
# nested folders is Phase 3c-3 (next PR).
|
||||
#
|
||||
# This list is empty in the scaffold commit. Phase 3c-2 (extract content
|
||||
# with git history) populates it. Phase 3c-3 nests doc-spec + triage-op
|
||||
# under dev-lead/.
|
||||
roots: []
|
||||
# Phase 3c-2 (this PR): roots: points at teams/dev.yaml as the single
|
||||
# Dev Lead root that recursively pulls in core-platform, controlplane,
|
||||
# app-docs, infra, sdk sub-teams + release-manager + integration-tester
|
||||
# + fullstack-engineer floaters + documentation-specialist + triage-operator
|
||||
# (the last two added per Hongming Q1+Q2).
|
||||
roots:
|
||||
- !include teams/dev.yaml
|
||||
|
||||
@ -35,4 +35,9 @@ children:
|
||||
- !include ../release-manager/workspace.yaml
|
||||
- !include ../integration-tester/workspace.yaml
|
||||
- !include ../fullstack-engineer/workspace.yaml
|
||||
# Q2 (Hongming 2026-05-08): triage-operator moved into dev tree as dev-lead child.
|
||||
# Q1: doc-spec is already a child of teams/app-docs.yaml (within app-lead),
|
||||
# so it's transitively under dev-lead via the app-docs sub-team — no
|
||||
# additional include here, otherwise the validator flags duplicate parent.
|
||||
- !include triage-operator.yaml
|
||||
initial_prompt_file: initial-prompt.md
|
||||
|
||||
Loading…
Reference in New Issue
Block a user