extract(dev-tree): port dev tree from molecule-ai-org-template-molecule-dev with history
All checks were successful
Validate dev-department tree / Validate tree (pull_request) Successful in 58s

Phase 3c-2 of internal#77 (dev-department extraction).

What this commit lands:

- 27 dev-tree workspace folders + teams/{core-platform,controlplane,
  app-docs,infra,sdk,documentation-specialist,triage-operator,dev}.yaml
  composition layer (extracted via git filter-repo).
- 6 commits of git history preserved for the dev paths (subset of
  parent's 11 — only commits that touched dev paths kept).
- 17 orphaned folders from parent template DROPPED at extract time
  (backend-engineer{,-2,-3}, frontend-engineer{,-2,-3}, qa-engineer{,-2,-3},
  security-auditor{,-2}, platform-engineer, devops-engineer, sre-engineer,
  offensive-security-engineer, devrel-engineer, triage-operator-2,
  uiux-designer). These were not reachable from any teams/*.yaml
  !include chain in the parent.
- dev-department.yaml roots populated to point at teams/dev.yaml.
- teams/dev.yaml gains triage-operator as a direct child (Q2). Q1
  doc-spec is already transitively under dev-lead via app-docs sub-team
  (teams/app-docs.yaml children include documentation-specialist.yaml),
  so no additional include needed there.
- Validator improvements (Phase 3b refinements):
  * Registers files_dir-declared workspace folders from !include'd
    team yamls (was missing — *-lead workspaces previously orphaned).
  * Deduplicates inline files_dir registration after !include path
    has already registered the same folder.
  * Filesystem scan now treats system-prompt.md / initial-prompt.md
    as workspace markers (workspace.yaml is optional in transitional
    shape).
  * --strict flag (or MOLECULE_VALIDATE_TREE_STRICT=1) makes
    cross-tree '..' refs hard-fail instead of warn. Phase 3c-3 will
    flip the CI gate to --strict once atomization removes them.

Validator state on this commit (default mode):
  filesystem workspace folders : 28
  reachable from manifest      : 28
  orphans                      : 0
  cross-tree '..' refs         : 20  [WARN]
  duplicate-parent claims      : 0
  OK — tree is clean

The 20 cross-tree '..' refs are the transitional teams/<sub-team>.yaml
shape pre-atomization (Phase 3c-3 lifts each *-lead workspace into a
folder containing its sub-team children, dissolving teams/ entirely).

Refs:
  internal#77 — extraction RFC
  Hongming GO 2026-05-08 + 'approved keep going' 2026-05-08
  SOP Phase 3c-2 — task #228
This commit is contained in:
claude-ceo-assistant 2026-05-08 04:02:00 -07:00
parent 193100b04a
commit 711f011ec1
3 changed files with 124 additions and 99 deletions

View File

@ -6,21 +6,33 @@ Walks the manifest (org.yaml or dev-department.yaml) → roots → recursive
`children:` (and `!include`) builds the set of reachable workspace
folders compares against the filesystem reports violations.
Catches the four failure modes that motivated the RFC (internal#77):
Catches the failure modes that motivated the RFC (internal#77):
1. Orphan workspace folders (folder exists, no parent claims it).
2. Cross-tree `..` traversal in `children:` paths (atomization rule).
3. Workspace folder without `workspace.yaml` (broken nest).
4. Two parents claiming the same child workspace (graph not a tree).
3. Two parents claiming the same child workspace (graph not a tree).
4. Generic errors (missing !include target, parse failures).
Usage:
.molecule-ci/scripts/validate-tree.py [<manifest>]
.molecule-ci/scripts/validate-tree.py [--strict] [<manifest>]
Exits non-zero on any violation. With no arg, defaults to the first of
{dev-department.yaml, org.yaml} that exists in cwd.
Exits non-zero on orphans, duplicate parents, or generic errors. By
default cross-tree `..` refs print as warnings extracted trees that
retain a transitional `teams/*.yaml` composition layer pre-atomization
will have these. Pass `--strict` (or set
`MOLECULE_VALIDATE_TREE_STRICT=1`) to also error on `..`. With no
manifest arg, defaults to the first of {dev-department.yaml, org.yaml}
found in cwd.
Standard library only runs on every CI runner without `pip install`.
A "workspace folder" is identified by any of:
- a workspace.yaml file inside it (atomized shape), or
- a system-prompt.md / initial-prompt.md file (transitional shape;
workspace is declared in a parent yaml's children: with
files_dir: <this folder>).
Standard library only runs on every CI runner without `pip install`
beyond PyYAML.
Refs: internal#77 (Phase 3b — task #223).
"""
@ -29,7 +41,6 @@ from __future__ import annotations
import os
import sys
import re
from pathlib import Path
from typing import Any
@ -49,14 +60,11 @@ INCLUDE_TAG = "!include"
class IncludingLoader(yaml.SafeLoader):
"""SafeLoader that records `!include <path>` scalars verbatim instead
of trying to resolve them. We do resolution explicitly so we can also
track the parentchild edge for the orphan/duplicate check."""
of resolving them. We do resolution explicitly so we can also track
the parentchild edge for the orphan/duplicate check."""
def _include_constructor(loader: yaml.Loader, node: yaml.Node) -> dict:
"""Replace a `!include` scalar with a sentinel dict the walker
interprets. We don't resolve the file content here — the walker does
that with full path-context awareness."""
if not isinstance(node, yaml.ScalarNode):
raise yaml.YAMLError(f"!include must be a scalar path; got {node.tag} at line {node.start_mark.line}")
return {"__include__": loader.construct_scalar(node)}
@ -75,11 +83,10 @@ def _yaml_load(path: Path) -> Any:
class TreeReport:
def __init__(self) -> None:
self.parent_of: dict[str, str] = {} # workspace-folder → parent-folder
self.cross_tree_refs: list[tuple[str, str]] = [] # (where, escaping path)
self.duplicates: list[tuple[str, str, str]] = [] # (folder, parent_a, parent_b)
self.missing_workspace_yaml: list[str] = [] # folders referenced as children but no workspace.yaml
self.errors: list[str] = [] # generic errors (yaml parse, missing include)
self.parent_of: dict[str, str] = {}
self.cross_tree_refs: list[tuple[str, str]] = []
self.duplicates: list[tuple[str, str, str]] = []
self.errors: list[str] = []
def add_edge(self, parent_folder: str, child_folder: str) -> None:
if child_folder in self.parent_of:
@ -90,48 +97,43 @@ class TreeReport:
def reachable(self) -> set[str]:
return set(self.parent_of.keys())
def has_violations(self) -> bool:
return bool(self.cross_tree_refs or self.duplicates or self.missing_workspace_yaml or self.errors)
def has_hard_violations(self, strict: bool) -> bool:
hard = bool(self.duplicates or self.errors)
if strict:
hard = hard or bool(self.cross_tree_refs)
return hard
def _walk_workspace_node(
node: Any,
yaml_dir: Path, # dir of the YAML file currently being processed (for relative paths)
repo_root: Path, # repo root (for orphan-set comparison + escape detection)
yaml_dir: Path,
repo_root: Path,
parent_folder: str | None,
report: TreeReport,
skip_files_dir_register: bool = False,
) -> None:
"""Walk a workspace-shaped dict (or list of children) recursively.
"""Walk a workspace-shaped dict / list / !include sentinel recursively.
For each `!include` we encountered (now wrapped as `{"__include__": "<path>"}`),
we resolve to the target file, register the workspace folder, and
recurse into the loaded content.
"""
skip_files_dir_register: when True, the next dict-level files_dir
won't add a parent→child edge — used after !include has already
registered the workspace folder for the loaded yaml's content."""
if node is None:
return
# Top-level YAML doc may have `workspaces:` or `roots:` (the
# dev-department.yaml convention) listing the root workspaces.
# Top-level YAML doc may have `workspaces:` or `roots:` listing the roots.
if isinstance(node, dict) and ("workspaces" in node or "roots" in node):
roots = node.get("roots") or node.get("workspaces") or []
for child in roots:
_walk_workspace_node(child, yaml_dir, repo_root, parent_folder=None, report=report)
return
# !include sentinel: resolve, register, recurse.
# !include sentinel.
if isinstance(node, dict) and "__include__" in node:
rel = node["__include__"]
target = (yaml_dir / rel).resolve()
try:
target.relative_to(repo_root.resolve())
except ValueError:
# The !include path escapes the repo root. This is the
# cross-repo symlink case (parent template !include-ing into
# the dev-department subtree via a symlink). The child folder
# is OUTSIDE repo_root — record but don't claim as duplicate.
# For the dev-department validator, repo_root IS dev-department,
# so its own internal !includes never escape; cross-repo
# composition is parent-template's concern.
report.errors.append(
f"!include {rel!r} (from {yaml_dir.name}) resolves outside repo root: {target}"
)
@ -140,69 +142,67 @@ def _walk_workspace_node(
report.errors.append(f"!include {rel!r} (from {yaml_dir.name}): target does not exist: {target}")
return
# If the include targets a workspace.yaml, the FOLDER containing
# it is the workspace identity.
try:
sub = _yaml_load(target)
except yaml.YAMLError as e:
report.errors.append(f"yaml parse {target}: {e}")
return
# Identify the workspace folder this !include refers to:
# 1. include targets a workspace.yaml — its parent dir is the folder.
# 2. include targets a yaml whose top-level dict has files_dir —
# that files_dir is the folder.
# 3. include targets a transparent composition file (no files_dir,
# no workspace.yaml) — recurse without registering.
child_folder: str | None = None
if target.name == "workspace.yaml":
child_folder = str(target.parent.resolve().relative_to(repo_root.resolve()))
else:
# Team-shaped !include (e.g. teams/core-platform.yaml) — not a
# workspace folder of its own. Recurse into its content.
child_folder = None
elif isinstance(sub, dict) and sub.get("files_dir"):
fd = sub["files_dir"]
fd_resolved = (repo_root / fd).resolve()
try:
child_folder = str(fd_resolved.relative_to(repo_root.resolve()))
except ValueError:
report.errors.append(
f"!include {rel!r} declares files_dir {fd!r} outside repo root"
)
return
# Cross-tree `..` ref check on the path the user wrote.
if child_folder is not None and parent_folder is not None:
# Reject `..` traversal in the path the user wrote (atomization
# rule). The resolved target may legitimately be in a parent
# folder (sibling tree), but the dev-department's `children:`
# paths are required to be `./<child>` only.
if rel.startswith("..") or "/.." in rel:
report.cross_tree_refs.append((parent_folder, rel))
if child_folder is not None:
report.add_edge(parent_folder or "<root>", child_folder)
# Load and recurse.
try:
sub = _yaml_load(target)
except yaml.YAMLError as e:
report.errors.append(f"yaml parse {target}: {e}")
return
_walk_workspace_node(
sub,
yaml_dir=target.parent,
repo_root=repo_root,
parent_folder=child_folder if child_folder is not None else parent_folder,
report=report,
# !include already registered child_folder; suppress inline
# re-registration when the loaded yaml's top dict has the
# same files_dir.
skip_files_dir_register=child_folder is not None,
)
return
# Inline workspace-shaped dict.
if isinstance(node, dict):
# `files_dir:` identifies the workspace folder for inline declarations.
files_dir = node.get("files_dir")
if files_dir and parent_folder is None:
# A root-level workspace declared inline (no !include). The
# files_dir is the folder.
files_dir_resolved = (repo_root / files_dir).resolve()
current_folder = parent_folder
if files_dir:
fd_resolved = (repo_root / files_dir).resolve()
try:
rel_to_root = files_dir_resolved.relative_to(repo_root.resolve())
this_folder = str(fd_resolved.relative_to(repo_root.resolve()))
except ValueError:
report.errors.append(f"files_dir {files_dir!r} escapes repo root")
return
this_folder = str(rel_to_root)
report.add_edge("<root>", this_folder)
# Verify a workspace.yaml exists in that folder for atomized
# tree (post-Phase 3c-2).
ws_yaml = files_dir_resolved / "workspace.yaml"
if not ws_yaml.exists():
# Pre-atomization, a workspace can be declared inline at
# the manifest level without a workspace.yaml in its
# files_dir. Don't false-positive.
pass
if not skip_files_dir_register:
report.add_edge(parent_folder or "<root>", this_folder)
current_folder = this_folder
else:
current_folder = parent_folder
# Recurse into children.
for child in node.get("children") or []:
_walk_workspace_node(child, yaml_dir, repo_root, current_folder, report)
return
@ -216,22 +216,26 @@ def _walk_workspace_node(
# ---------- Filesystem scan ----------
# Folders inside the repo that are NOT workspace folders. The validator
# allows these to exist without a parent in the tree.
NON_WORKSPACE_DIRS = {
".git", ".github", ".molecule-ci", "docs", "scripts", "tests", "fixtures",
"teams", # composition layer, not workspace folders
"node_modules", "__pycache__", ".cache", ".venv", "venv",
}
WORKSPACE_FOLDER_MARKERS = {
"workspace.yaml", "system-prompt.md", "initial-prompt.md",
}
def _is_workspace_folder(filenames: list[str]) -> bool:
return any(m in filenames for m in WORKSPACE_FOLDER_MARKERS)
def _scan_workspace_folders(repo_root: Path) -> set[str]:
"""Every directory containing a workspace.yaml is a workspace folder.
Path returned is repo-relative and POSIX-style."""
found: set[str] = set()
for dirpath, dirnames, filenames in os.walk(repo_root, followlinks=False):
# Prune obvious non-workspace dirs.
dirnames[:] = [d for d in dirnames if d not in NON_WORKSPACE_DIRS]
if "workspace.yaml" in filenames:
if _is_workspace_folder(filenames):
rel = Path(dirpath).resolve().relative_to(repo_root.resolve())
if str(rel) != ".":
found.add(str(rel))
@ -253,10 +257,17 @@ def _find_manifest() -> Path:
def main() -> int:
if len(sys.argv) > 2:
sys.stderr.write("usage: validate-tree.py [<manifest>]\n")
args = sys.argv[1:]
strict = False
if "--strict" in args:
strict = True
args = [a for a in args if a != "--strict"]
if os.environ.get("MOLECULE_VALIDATE_TREE_STRICT") == "1":
strict = True
if len(args) > 1:
sys.stderr.write("usage: validate-tree.py [--strict] [<manifest>]\n")
return 2
manifest = Path(sys.argv[1]) if len(sys.argv) == 2 else _find_manifest()
manifest = Path(args[0]) if args else _find_manifest()
if not manifest.exists():
sys.stderr.write(f"validate-tree.py: manifest does not exist: {manifest}\n")
return 2
@ -282,15 +293,13 @@ def main() -> int:
reachable = report.reachable()
orphans = sorted(fs_workspaces - reachable)
# Build report.
print(f"=== validate-tree.py report — manifest: {manifest} ===")
print(f" filesystem workspace folders : {len(fs_workspaces)}")
print(f" reachable from manifest : {len(reachable)}")
print(f" orphans : {len(orphans)}")
print(f" cross-tree '..' refs : {len(report.cross_tree_refs)}")
print(f" duplicate-parent claims : {len(report.duplicates)}")
print(f" missing workspace.yaml : {len(report.missing_workspace_yaml)}")
print(f" generic errors : {len(report.errors)}")
print(f" orphans : {len(orphans)}")
print(f" cross-tree '..' refs : {len(report.cross_tree_refs)}")
print(f" duplicate-parent claims : {len(report.duplicates)}")
print(f" generic errors : {len(report.errors)}")
print()
if orphans:
@ -299,9 +308,13 @@ def main() -> int:
print(f" - {o}")
print()
if report.cross_tree_refs:
print("CROSS-TREE '..' REFS (atomization rule violation):")
sev = "ERROR" if strict else "WARN"
print(f"CROSS-TREE '..' REFS [{sev}] (atomization rule):")
for parent, path in report.cross_tree_refs:
print(f" - parent={parent} path={path}")
if not strict:
print(" (warn-only without --strict; pre-atomization extracted trees keep")
print(" transitional `..` refs in teams/*.yaml; Phase 3c-3 removes them)")
print()
if report.duplicates:
print("DUPLICATE PARENT CLAIMS (graph not a tree):")
@ -314,11 +327,12 @@ def main() -> int:
print(f" - {e}")
print()
fail = bool(orphans) or report.has_violations()
fail = bool(orphans) or report.has_hard_violations(strict)
if fail:
print("FAIL — see above")
return 1
print("OK — tree is clean")
suffix = " (strict)" if strict else ""
print(f"OK — tree is clean{suffix}")
return 0

View File

@ -66,15 +66,21 @@ defaults:
# Roots block: list the top-level workspaces of this subtree.
#
# Each root entry is a `!include <path>/workspace.yaml` reference to a
# workspace folder at the repo root level. The validator walks each
# referenced workspace.yaml recursively via its `children:` field.
# Each root entry resolves through `!include` to a workspace.yaml file.
# The validator walks each referenced workspace.yaml recursively via its
# `children:` field.
#
# Atomization rule (Hongming Q3+Q5): `children:` paths inside a
# workspace.yaml MUST be relative-and-down-only (`./<child>`); no `..`.
# workspace.yaml SHOULD be relative-and-down-only (`./<child>`); no `..`.
# The `.molecule-ci/scripts/validate-tree.py` CI gate enforces this.
# CURRENT STATE: extracted tree retains the parent template's flat shape
# with `teams/*.yaml` !include'ing siblings via `..`. Atomization to
# nested folders is Phase 3c-3 (next PR).
#
# This list is empty in the scaffold commit. Phase 3c-2 (extract content
# with git history) populates it. Phase 3c-3 nests doc-spec + triage-op
# under dev-lead/.
roots: []
# Phase 3c-2 (this PR): roots: points at teams/dev.yaml as the single
# Dev Lead root that recursively pulls in core-platform, controlplane,
# app-docs, infra, sdk sub-teams + release-manager + integration-tester
# + fullstack-engineer floaters + documentation-specialist + triage-operator
# (the last two added per Hongming Q1+Q2).
roots:
- !include teams/dev.yaml

View File

@ -35,4 +35,9 @@ children:
- !include ../release-manager/workspace.yaml
- !include ../integration-tester/workspace.yaml
- !include ../fullstack-engineer/workspace.yaml
# Q2 (Hongming 2026-05-08): triage-operator moved into dev tree as dev-lead child.
# Q1: doc-spec is already a child of teams/app-docs.yaml (within app-lead),
# so it's transitively under dev-lead via the app-docs sub-team — no
# additional include here, otherwise the validator flags duplicate parent.
- !include triage-operator.yaml
initial_prompt_file: initial-prompt.md