Compare commits

...

23 Commits

Author SHA1 Message Date
c2f5d68830 Merge pull request 'feat(actions): add audit-force-merge composite action' (#5) from feat/audit-force-merge-composite-action into main 2026-05-09 03:30:02 +00:00
120b71c564 feat(actions): add audit-force-merge composite action
§SOP-6 force-merge detector, hosted as a Gitea Actions composite
action so it can be vendored into every org repo via a single
`uses:` line instead of copy-pasting the bash. Source of truth
for the audit script logic.

Why composite vs reusable workflow: Gitea 1.22.6 doesn't support
cross-repo `uses: org/repo/.gitea/workflows/X.yml@ref`. Cross-repo
reusable workflows landed in go-gitea/gitea#32562 (1.26.0, Oct 2025)
and have not been backported. Composite actions resolve via the
actions-fetch path which works cross-repo against a public callee.
Re-evaluate when operator host runs Gitea ≥ 1.26.

Consumer workflow shape:

    on:
      pull_request_target:
        types: [closed]
    jobs:
      audit:
        if: github.event.pull_request.merged == true
        runs-on: ubuntu-latest
        steps:
          - uses: molecule-ai/molecule-ci/.gitea/actions/audit-force-merge@main
            with:
              gitea-token: ${{ secrets.SOP_TIER_CHECK_TOKEN }}
              repo: ${{ github.repository }}
              pr-number: ${{ github.event.pull_request.number }}
              required-checks: |
                sop-tier-check / tier-check (pull_request)

No actions/checkout step needed in the consumer — the audit script
does pure API calls, never reads working tree. Removing checkout is
also a small security win (PR head code never loaded).

Verified end-to-end on internal#123 + molecule-core#150 with the
inline copies (which this PR will replace via consumer-side stub
PRs once merged). Tier: low.
2026-05-08 20:29:40 -07:00
9f76a0faab Merge pull request 'fix(validate): recognize !external + !include as opaque refs (skip, not error)' (#4) from fix/validator-external-include-tags into main 2026-05-08 15:52:57 +00:00
dev-lead
d47c15d526 fix(validate): recognize !external + !include as opaque refs (skip, not error)
molecule-ai-org-template-molecule-dev's CI has been red since the
"pin: dev-department v1.0.0" merge. Symptom:

  ::error::Workspace at <unnamed>: missing 'name'
  ::error::Workspace at <unnamed>: missing 'name'

Root cause: org.yaml uses `!external` for the dev-department subtree
fetch (introduced internal#77 / molecule-core#105). The PermissiveLoader
formerly handed every unknown tag to a single multi-constructor that
flattens the parsed value to a plain dict. The validator's
validate_workspace() then saw a dict with no `name` key and tripped
the "missing name" error — but the dict was a `!external` directive,
not a malformed workspace.

The fix wraps both supported tags in distinct sentinel types:

  - !include  → IncludeRef (str subclass)
  - !external → ExternalRef (dict subclass)

validate_workspace() and count_ws() now skip these instead of treating
them as workspace shape. Real workspace dicts (with names) still get
the full structural check. Unknown tags fall through to the
multi-constructor exactly as before, preserving back-compat.

Verified on the live failing org.yaml:
  ✓ org.yaml valid: Molecule AI Dev Team (0 direct workspaces;
    external refs not counted)

And on a synthetic case with one real bug (missing-name workspace
nested under children):
  ::error::Workspace at <unnamed>: missing 'name'
  ::error::Workspace at <unnamed>/<unnamed>: missing 'name'
  exit 1

So the validator still catches real shape bugs; it just doesn't
false-positive on the new !external pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:52:32 -07:00
785251f9ab Merge pull request 'fix(ci): replace cross-repo actions/checkout with direct git clone' (#3) from fix/git-clone-instead-of-actions-checkout into main 2026-05-07 08:40:43 +00:00
security-auditor
3eb62072a2 fix(ci): replace cross-repo actions/checkout with direct git clone
molecule-ci#2 attempted token: '' to force anonymous on the cross-repo
checkout. CI on plugin-molecule-careful-bash@663bf72 (post-merge of #2)
revealed actions/checkout@v4 errors with:

  ::error::Input required and not supplied: token

Even though token's input definition is required:false with a default,
the action's runtime auth-helper calls getInput('token', {required: true})
internally — empty string fails that check.

Fix: replace the cross-repo actions/checkout with a direct git clone
shell step. molecule-ci is public; anonymous git clone has neither the
auth-trips-Gitea-404 problem (#2's target) nor the empty-token-input-
required problem (#2's actual failure shape).

3 files updated, 4 sites total:
  * validate-plugin.yml (1 site)
  * validate-workspace-template.yml (2 sites)
  * validate-org-template.yml (1 site)

Refs: internal#46. Closes the third root cause uncovered by the
verification cycle on plugin-molecule-careful-bash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:37:34 -07:00
d2bb7cf255 Merge pull request 'fix(ci): force anon checkout of public molecule-ci to bypass Gitea cross-repo 404' (#2) from fix/anon-cross-repo-checkout into main 2026-05-07 08:34:55 +00:00
security-auditor
7e2bde9b77 fix(ci): force anon checkout of public molecule-ci to bypass Gitea cross-repo 404
After lowercasing the slug (molecule-ci#1) and flipping molecule-ci public,
plugin/template/org-template CI still failed at the SECOND actions/checkout
step (the one that fetches molecule-ci itself for canonical validator scripts).

Failure mode in act_runner log:
  Run actions/checkout@v4
    repository: molecule-ai/molecule-ci
    path: .molecule-ci-canonical
  Syncing repository: molecule-ai/molecule-ci
  [git config http.https://git.moleculesai.app/.extraheader AUTHORIZATION: basic ***]
  ::error::The target couldn't be found.
   Failure - Main actions/checkout@v4

Root cause: actions/checkout@v4 sends `Authorization: basic <github.token>` —
the per-job Gitea-issued token, scoped to the calling plugin/template repo
only. On Gitea, an authenticated request that lacks repo-permission 404s
instead of falling back to anonymous-public-read (a Gitea-vs-GitHub
behaviour difference). Anonymous git clone of molecule-ci succeeds; the auth
header is what trips the 404.

Fix: pass `token: ''` to force anonymous fetch on the cross-repo checkouts.
molecule-ci is public; no auth is needed for read.

3 sites updated:
  * validate-plugin.yml (1 site)
  * validate-workspace-template.yml (2 sites — both jobs in the file)
  * validate-org-template.yml (1 site)

Verified by: re-triggering plugin-molecule-careful-bash#2 will be GREEN
end-to-end after this lands. The 33 downstream lowercase-slug PRs are NOT
mass-merged until that verification.

Refs: internal#46

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:23:37 -07:00
226975d377 Merge pull request 'fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs' (#1) from fix/lowercase-org-slug into main 2026-05-07 08:07:02 +00:00
security-auditor
2bcd52b444 fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs
Gitea is case-sensitive on owner slugs; canonical is lowercase
`molecule-ai/...`. Mixed-case `Molecule-AI/...` refs fail-at-0s
when the runner tries to resolve the cross-repo workflow / checkout.

Same fix as molecule-controlplane#12. Mechanical case-correction;
no behavior change beyond making CI resolve again.

Refs: internal#46

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:58:55 -07:00
Hongming Wang
b31b722899
Merge pull request #33 from Molecule-AI/feat/turn-smoke-publish-image-gate
ci(publish): bump boot-smoke timeout to 90s/120s for SDK-init-wedge coverage
2026-05-01 17:52:28 -07:00
Hongming Wang
50e84f89e9 ci(publish): bump boot-smoke timeout to 90s/120s for SDK-init-wedge coverage
Pairs with molecule-core PR #2473 (run_executor_smoke now consults
runtime_wedge.is_wedged() at the end of every result path).

10s smoke timeout was shorter than claude-agent-sdk's 60s
initialize() handshake — when a malformed CLI argv made the SDK
spin on init (PR #25 in claude-code template), the outer wait_for
fired first, run_executor_smoke saw "execution proceeding past
imports → timeout → PASS" and shipped the broken image to GHCR.

Bumping to 90s lets the SDK time itself out, the executor's wedge
catch arm runs, and runtime_wedge.mark_wedged() flips the flag
that smoke_mode now reads. Outer `timeout` bumped to 120s — the
runner-level safety net stays slightly longer than the inner cap
so a smoke_mode regression that doesn't terminate surfaces as exit
124 with a clear error, not just exit 1.

Step comment names this calibration explicitly so a future
contributor doesn't shrink it back without injecting a wedge in
the smoke_mode unit tests first. Error message references
runtime_wedge so a failure-mode reader knows where to look.
2026-05-01 17:48:51 -07:00
Hongming Wang
a79ef8e9fa
Merge pull request #32 from Molecule-AI/feat/template-validation-aggregator
ci: add Template validation aggregator (restore historical check name)
2026-04-30 23:01:43 -07:00
Hongming Wang
375bcc4376 ci(validate-workspace-template): add Template validation aggregator
The workflow was refactored from one `validate` job (display name
"Template validation") into matrix-named validate-static +
validate-runtime jobs ("(static)" / "(runtime)" suffixes) for
fork-PR security. The new check names — `validate / Template
validation (static)` and `validate / Template validation
(runtime)` — never match the original `validate / Template
validation` that template-repo branch protection requires. Result:
auto-merge silently hangs in BLOCKED forever on every template
repo because the required check never reports.

Add a third aggregator job `template-validation` (display name
"Template validation") that depends on both real jobs and emits
the original check name. `if: always()` so it reports out even
when validate-static fails — without that GitHub marks the
aggregator SKIPPED and branch protection still blocks because the
required check never reaches a final state.

Treats `skipped` as pass for validate-runtime so fork PRs (where
runtime is intentionally skipped on the security gate) don't
become un-mergeable.

Caught while shipping the boot-smoke fixes for openclaw#11 and
hermes#29 — both PRs sat BLOCKED with all real checks green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:01:01 -07:00
Hongming Wang
2bbc6e0e80
Merge pull request #31 from Molecule-AI/fix/publish-template-smoke-cleanup
fix(publish-template-image): tolerate host-side uid 1000 ownership in smoke cleanup
2026-04-30 21:56:53 -07:00
Hongming Wang
da6407e58a fix(publish-template-image): make smoke-cleanup tolerate host-side uid 1000 ownership
Third hot-fix for #2275 Phase 2 — claude-code re-run #3 showed the
boot smoke ITSELF passing (`[smoke-mode] PASS: timed out past import-
tree (imports healthy)`), but the workflow step still exited 1 because
the post-smoke cleanup `rm -rf "${SMOKE_CONFIG_DIR}"` failed with
`Permission denied`.

Root cause: the image entrypoint (entrypoint.sh) does
`chown -R agent:agent /configs` before exec'ing molecule-runtime as
uid 1000. Because /configs is a bind-mount of the host's mktemp dir,
the chown propagates to the host — the runner user (the GHA `runner`
account, NOT root) can no longer delete the files inside it. With
`set -e` in effect, that rm exit propagates and we report failure
even though the gate itself passed.

Fix: best-effort rm with sudo fallback and final `|| true`. The
runner is ephemeral; /tmp gets cleaned automatically at job teardown.

Verified against run 25202859503 which showed every other step green
+ the smoke itself passing — only this rm was the blocker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:56:36 -07:00
Hongming Wang
86092315a7
Merge pull request #30 from Molecule-AI/fix/publish-template-smoke-pythonpath
fix(publish-template-image): inject PYTHONPATH=/app for boot smoke
2026-04-30 21:54:18 -07:00
Hongming Wang
a9df950801 fix(publish-template-image): inject PYTHONPATH=/app to match production provisioner
Second hot-fix for #2275 Phase 2 — boot smoke kept failing with
`ModuleNotFoundError: No module named 'adapter'` even after the
permissions fix landed.

Root cause: the production platform's provisioner sets PYTHONPATH=/app
on every workspace container (provisioner.go:563) so molecule-runtime —
a pip console_scripts entry point whose sys.path[0] is /usr/local/bin,
NOT /app — can resolve `importlib.import_module('adapter')`. The
existing static import smoke didn't hit this because `python3 -c "import
$mod"` adds cwd to sys.path; only the entry-point invocation needs
PYTHONPATH.

Mirrors prod by passing `-e PYTHONPATH=/app` in the docker run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:54:02 -07:00
Hongming Wang
b4e17014fa
Merge pull request #29 from Molecule-AI/fix/publish-template-smoke-perms
fix(publish-template-image): chmod a+rX + drop :ro so agent can read /configs
2026-04-30 21:49:46 -07:00
Hongming Wang
a5212a349b fix(publish-template-image): chmod a+rX + drop :ro so agent can read /configs
Hot-fix for #2275 Phase 2 — the boot smoke step in v1@3c8f8fe failed
on every template publish with `PermissionError: [Errno 13] Permission
denied: '/configs/config.yaml'` because `mktemp -d` creates the dir
with mode 700 and `chmod -R go+r` adds 'r' to files but doesn't add
'x' to directories. Inside the image the entrypoint drops priv to
uid 1000 (agent), which then cannot traverse /configs to even reach
config.yaml — main.py exits before any executor code runs.

Two changes:
1. `chmod -R a+rX` (capital X) adds 'x' to directories AND already-
   executable files, so the temp dir becomes traversable for agent
   while config.yaml stays a regular world-readable file.
2. Drop `:ro` on the mount so the entrypoint's `chown -R agent
   /configs` succeeds. The container is ephemeral; modifications to
   the host mktemp dir don't matter and the dir gets nuked right
   after the smoke run.

Reproduced + diagnosed against claude-code publish run 25202651546
which failed within a few seconds on Path('/configs/config.yaml').exists()
in molecule_runtime/config.py:298.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:49:26 -07:00
Hongming Wang
3c8f8fe48b
Merge pull request #28 from Molecule-AI/feat/publish-template-image-boot-smoke
feat(publish-template-image): add execute()-against-stub-deps boot smoke (#2275)
2026-04-30 21:44:30 -07:00
Hongming Wang
434d1782e6 feat(publish-template-image): add execute()-against-stub-deps boot smoke (#2275)
Adds a step between the existing import smoke and the GHCR push that
boots the just-built image with MOLECULE_SMOKE_MODE=1, which routes
molecule-runtime through the new smoke_mode.run_executor_smoke() —
invokes executor.execute(stub_ctx, stub_queue) once with a 10s timeout.

Healthy import tree → execution proceeds far enough to hit a network
boundary and times out (exit 0). Broken lazy import inside an
`async def execute(...)` body → ImportError/ModuleNotFoundError
(exit 1). The 2026-04-2x v0→v1 a2a-sdk migration shipped 5 such
regressions in templates that the existing static import smoke missed.

Skip path: when the installed runtime predates 0.1.60 (pre-smoke_mode),
the step prints a warning + exits 0. Templates pinned to older runtimes
keep publishing without this gate flipping red; cascade-triggered
builds (which forward the just-published version as RUNTIME_VERSION)
get the gate automatically.

Belt-and-suspenders `timeout 60` wrapper so smoke_mode itself can't
wedge the runner past one minute per template.

After merge, bump v1 tag to point at the new main SHA (caller repos
pin to @v1; the change has no effect until the moving tag advances).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:41:35 -07:00
Hongming Wang
53f01d5b44
Merge pull request #27 from Molecule-AI/auto/p135-fork-pr-lockdown
ci: lock down validate-workspace-template against fork-PR untrusted code (P135)
2026-04-30 01:08:40 -07:00
11 changed files with 427 additions and 28 deletions

View File

@ -0,0 +1,55 @@
name: 'Audit force-merge'
description: >-
§SOP-6 force-merge audit. Detects PRs merged with required-status-checks
not green at HEAD SHA and emits incident.force_merge JSON to runner
stdout. Vector docker_logs source ships the line to Loki on
molecule-canonical-obs (per reference_obs_stack_phase1).
# Why a composite action and not a reusable workflow:
# Gitea 1.22.6 does NOT support cross-repo `uses: org/repo/.gitea/
# workflows/X.yml@ref`. Cross-repo reusable workflows landed in
# go-gitea/gitea PR #32562 in Gitea 1.26.0 (Oct 2025). On 1.22.x the
# clone fails because act_runner mints a caller-scoped GITEA_TOKEN.
# Composite actions resolve via the actions-fetch path which works
# cross-repo on 1.22 against a public callee — that's us. Re-evaluate
# this choice when the operator host upgrades to Gitea ≥ 1.26.
inputs:
gitea-token:
description: >-
PAT for sop-tier-bot (or equivalent read-only audit identity).
Needs read:user,read:repository,read:issue scopes — admin scope
is intentionally NOT required.
required: true
gitea-host:
description: 'Gitea host'
required: false
default: 'git.moleculesai.app'
repo:
description: 'owner/name; typically ${{ github.repository }}'
required: true
pr-number:
description: 'PR number; typically ${{ github.event.pull_request.number }}'
required: true
required-checks:
description: >-
Newline-separated required-status-check context names. Mirror
of branch protection's status_check_contexts. Declared at the
caller because /branch_protections requires admin scope which
this audit identity intentionally does not hold (least-privilege).
When the required-check set changes, update both branch
protection AND this input.
required: true
runs:
using: composite
steps:
- name: Detect force-merge + emit audit event
shell: bash
env:
GITEA_TOKEN: ${{ inputs.gitea-token }}
GITEA_HOST: ${{ inputs.gitea-host }}
REPO: ${{ inputs.repo }}
PR_NUMBER: ${{ inputs.pr-number }}
REQUIRED_CHECKS: ${{ inputs.required-checks }}
run: bash "$GITHUB_ACTION_PATH/audit.sh"

View File

@ -0,0 +1,118 @@
#!/usr/bin/env bash
# audit-force-merge — detect a §SOP-6 force-merge on a closed PR, emit
# `incident.force_merge` to stdout as structured JSON.
#
# Invoked by the `audit-force-merge` composite action defined alongside
# this script (action.yml). Caller workflows fire on
# `pull_request_target: closed` and gate on `merged == true`. See
# action.yml for the supported inputs.
#
# Vector's docker_logs source picks up runner stdout; the JSON gets
# shipped to Loki on molecule-canonical-obs, indexable by event_type.
# Query example:
#
# {host="operator"} |= "event_type" |= "incident.force_merge" | json
#
# A force-merge is detected when a merged PR had at least one of the
# caller-declared required-status-check contexts in a state other than
# "success" at the PR HEAD. That's exactly what the Gitea
# force_merge:true API call lets through, so it's a faithful detector
# of the override path.
#
# Required env (set by the composite action via inputs):
# GITEA_TOKEN, GITEA_HOST, REPO, PR_NUMBER, REQUIRED_CHECKS
#
# REQUIRED_CHECKS is newline-separated context names. Declared by the
# caller (mirror of branch protection's status_check_contexts) rather
# than fetched from /branch_protections, which requires admin scope —
# the audit identity is intentionally read-only (least-privilege; see
# memory/feedback_least_privilege_via_workflow_env).
set -euo pipefail
: "${GITEA_TOKEN:?required}"
: "${GITEA_HOST:?required}"
: "${REPO:?required}"
: "${PR_NUMBER:?required}"
: "${REQUIRED_CHECKS:?required (newline-separated context names)}"
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
API="https://${GITEA_HOST}/api/v1"
AUTH="Authorization: token ${GITEA_TOKEN}"
# 1. Fetch the PR. If not merged, no-op.
PR=$(curl -sS -H "$AUTH" "${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}")
MERGED=$(echo "$PR" | jq -r '.merged // false')
if [ "$MERGED" != "true" ]; then
echo "::notice::PR #${PR_NUMBER} closed without merge — no audit emission."
exit 0
fi
MERGE_SHA=$(echo "$PR" | jq -r '.merge_commit_sha // empty')
MERGED_BY=$(echo "$PR" | jq -r '.merged_by.login // "unknown"')
TITLE=$(echo "$PR" | jq -r '.title // ""')
BASE_BRANCH=$(echo "$PR" | jq -r '.base.ref // "main"')
HEAD_SHA=$(echo "$PR" | jq -r '.head.sha // empty')
if [ -z "$MERGE_SHA" ]; then
echo "::warning::PR #${PR_NUMBER} merged=true but no merge_commit_sha — cannot evaluate force-merge."
exit 0
fi
# 2. Required status checks declared in the workflow env.
REQUIRED="$REQUIRED_CHECKS"
if [ -z "${REQUIRED//[[:space:]]/}" ]; then
echo "::notice::REQUIRED_CHECKS empty — force-merge not applicable."
exit 0
fi
# 3. Status-check state at the PR HEAD (where checks ran). The merge
# commit doesn't get its own checks; we evaluate the PR's last
# commit, which is what branch protection compared against.
STATUS=$(curl -sS -H "$AUTH" \
"${API}/repos/${OWNER}/${NAME}/commits/${HEAD_SHA}/status")
declare -A CHECK_STATE
while IFS=$'\t' read -r ctx state; do
[ -n "$ctx" ] && CHECK_STATE[$ctx]="$state"
done < <(echo "$STATUS" | jq -r '.statuses // [] | .[] | "\(.context)\t\(.status)"')
# 4. For each required check, was it green at merge? YAML block scalars
# (`|`) leave a trailing newline; skip blank/whitespace-only lines.
FAILED_CHECKS=()
while IFS= read -r req; do
trimmed="${req#"${req%%[![:space:]]*}"}" # ltrim
trimmed="${trimmed%"${trimmed##*[![:space:]]}"}" # rtrim
[ -z "$trimmed" ] && continue
state="${CHECK_STATE[$trimmed]:-missing}"
if [ "$state" != "success" ]; then
FAILED_CHECKS+=("${trimmed}=${state}")
fi
done <<< "$REQUIRED"
if [ "${#FAILED_CHECKS[@]}" -eq 0 ]; then
echo "::notice::PR #${PR_NUMBER} merged with all required checks green — not a force-merge."
exit 0
fi
# 5. Emit structured audit event.
NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)
FAILED_JSON=$(printf '%s\n' "${FAILED_CHECKS[@]}" | jq -R . | jq -s .)
# Print as a single-line JSON so Vector's parse_json transform can pick
# it up cleanly from docker_logs.
jq -nc \
--arg event_type "incident.force_merge" \
--arg ts "$NOW" \
--arg repo "$REPO" \
--argjson pr "$PR_NUMBER" \
--arg title "$TITLE" \
--arg base "$BASE_BRANCH" \
--arg merged_by "$MERGED_BY" \
--arg merge_sha "$MERGE_SHA" \
--argjson failed_checks "$FAILED_JSON" \
'{event_type: $event_type, ts: $ts, repo: $repo, pr: $pr, title: $title,
base_branch: $base, merged_by: $merged_by, merge_sha: $merge_sha,
failed_checks: $failed_checks}'
echo "::warning::FORCE-MERGE detected on PR #${PR_NUMBER} by ${MERGED_BY}: ${#FAILED_CHECKS[@]} required check(s) not green at merge time."

View File

@ -21,7 +21,7 @@ name: Auto-promote branch (reusable)
# administration: read # read branch protection (REQUIRED — see below)
# jobs:
# promote:
# uses: Molecule-AI/molecule-ci/.github/workflows/auto-promote-branch.yml@v1
# uses: molecule-ai/molecule-ci/.github/workflows/auto-promote-branch.yml@v1
# with:
# from-branch: staging
# to-branch: main

View File

@ -28,7 +28,7 @@ name: Auto-promote staging → main (PR-based, reusable)
# pull-requests: write
# jobs:
# promote:
# uses: Molecule-AI/molecule-ci/.github/workflows/auto-promote-staging-pr.yml@v1
# uses: molecule-ai/molecule-ci/.github/workflows/auto-promote-staging-pr.yml@v1
# with:
# gates: "ci.yml,e2e-staging-canvas.yml,e2e-api.yml,codeql.yml"
# force: ${{ github.event.inputs.force == 'true' }}
@ -230,7 +230,7 @@ jobs:
cat > "$BODY_FILE" <<EOFBODY
Automated promotion of \`${SOURCE_BRANCH}\` (\`${TARGET_SHA:0:8}\`) to \`${TARGET_BRANCH}\`. Required gates green at this SHA: ${GATES_CSV}.
This PR is auto-generated by a thin caller of \`Molecule-AI/molecule-ci/.github/workflows/auto-promote-staging-pr.yml\` whenever every required gate completes green on the same source-branch SHA. It exists because protected branches require status checks "set by the expected GitHub apps" — direct \`git push\` from a workflow can't satisfy that, only PR merges through the queue can.
This PR is auto-generated by a thin caller of \`molecule-ai/molecule-ci/.github/workflows/auto-promote-staging-pr.yml\` whenever every required gate completes green on the same source-branch SHA. It exists because protected branches require status checks "set by the expected GitHub apps" — direct \`git push\` from a workflow can't satisfy that, only PR merges through the queue can.
Merge queue lands this; no human action needed unless gates fail.
EOFBODY

View File

@ -4,7 +4,7 @@ name: Auto-promote staging → main
# `auto-promote-branch.yml` workflow factored out for org-wide reuse.
# Other repos consume the same reusable workflow via:
#
# uses: Molecule-AI/molecule-ci/.github/workflows/auto-promote-branch.yml@v1
# uses: molecule-ai/molecule-ci/.github/workflows/auto-promote-branch.yml@v1
#
# Excluded by policy: molecule-core + molecule-controlplane stay
# manual per CEO directive 2026-04-24. Those repos do NOT call the

View File

@ -22,7 +22,7 @@ name: Disable auto-merge on push
# pull-requests: write
# jobs:
# disable-auto-merge-on-push:
# uses: Molecule-AI/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@v1
# uses: molecule-ai/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@v1
#
# False-positive behavior: if a CI bot pushes (e.g. dependency-update
# rebase, secret rotation), this also disables auto-merge for that

View File

@ -1,6 +1,6 @@
name: Publish Workspace Template Image
# Reusable workflow for every Molecule-AI/molecule-ai-workspace-template-*
# Reusable workflow for every molecule-ai/molecule-ai-workspace-template-*
# repo. Builds the template's Dockerfile on main and pushes to GHCR as
# `ghcr.io/molecule-ai/workspace-template-<runtime>:latest` (plus a
# per-commit `sha-<7>` tag). Auto-derives <runtime> from the caller repo
@ -17,7 +17,7 @@ name: Publish Workspace Template Image
# packages: write
# jobs:
# publish:
# uses: Molecule-AI/molecule-ci/.github/workflows/publish-template-image.yml@v1
# uses: molecule-ai/molecule-ci/.github/workflows/publish-template-image.yml@v1
# secrets: inherit
#
# Runner choice (2026-04-22): ubuntu-latest
@ -239,6 +239,140 @@ jobs:
'
echo "::notice::✓ ${IMAGE} all /app/*.py modules import cleanly against installed runtime"
- name: Boot smoke — execute() against stub deps (#2275, task #131)
# The static import smoke above only IMPORTs /app/*.py — lazy
# imports buried inside `async def execute(...)` bodies (e.g.
# `from a2a.types import FilePart`) NEVER evaluate at static-
# import time. The 2026-04-2x v0→v1 a2a-sdk migration shipped 5
# such regressions in templates that all looked fine at module-
# load smoke (claude-code, langgraph, deepagents, gemini-cli,
# hermes — every one a separate provisioning incident).
#
# This step boots the image with MOLECULE_SMOKE_MODE=1, which
# routes molecule-runtime through smoke_mode.run_executor_smoke()
# — invokes executor.execute(stub_ctx, stub_queue) once with a
# short timeout. Healthy import tree → execution proceeds far
# enough to hit a network boundary and times out (exit 0).
# Broken lazy import → ImportError/ModuleNotFoundError from
# inside the executor body (exit 1).
#
# Universal turn-smoke (task #131): run_executor_smoke also
# consults runtime_wedge.is_wedged() at the end of every result
# path and upgrades a provisional PASS to FAIL when an adapter
# marked the runtime wedged. Catches PR-25-class regressions
# (claude-agent-sdk init wedge from a malformed CLI argv) where
# the SDK takes 60s to time out on `initialize()` — the outer
# wait_for must outlast that handshake so the adapter's wedge
# catch arm runs before the smoke gives up. That's why the
# smoke timeout is 90s (NOT the original 10s) and the outer
# `timeout` wrapper is 120s (NOT 60s). Lowering either back
# makes this gate blind to init-wedge bugs again — confirm with
# an injected wedge in test_smoke_mode.py before changing.
#
# Requires runtime >= 0.1.60 (the version that introduced
# smoke_mode). Older runtimes silently no-op and would hang on
# uvicorn, so we detect the module first and skip if absent —
# this lets templates pinned to older runtimes continue to
# publish without this gate flipping red, while every fresh
# cascade-triggered build (which forwards the just-published
# version as RUNTIME_VERSION) gets the gate automatically.
#
# Wrapped in `timeout` as a belt-and-suspenders safety net in
# case smoke_mode itself wedges — runner shouldn't hang
# indefinitely on a single template.
shell: bash
env:
IMAGE: ${{ steps.tags.outputs.image }}:sha-${{ steps.tags.outputs.sha }}
run: |
set -eu
HAS_SMOKE_MODE=$(docker run --rm --entrypoint sh "${IMAGE}" -c \
'python3 -c "import molecule_runtime.smoke_mode" >/dev/null 2>&1 && echo yes || echo no')
if [ "${HAS_SMOKE_MODE}" = "no" ]; then
echo "::warning::installed runtime predates molecule-core#2275 (no molecule_runtime.smoke_mode); skipping boot smoke. Bump requirements.txt to molecule-ai-workspace-runtime>=0.1.60 to enable."
exit 0
fi
if [ ! -f config.yaml ]; then
echo "::error::config.yaml not found at repo root — boot smoke needs it to populate /configs. Templates without a config.yaml at root cannot be boot-smoked; either add one or skip this gate by setting an old runtime pin."
exit 1
fi
# Mount the repo's own config.yaml at /configs so the runtime
# can reach create_executor() — that's where the lazy imports
# we want to test actually live. The image's entrypoint drops
# priv from root to agent (uid 1000) before exec'ing
# molecule-runtime, so /configs needs to be readable AND
# traversable from uid 1000.
#
# Use `a+rX` (capital X — only adds x where it's already
# executable, i.e. directories): mktemp -d creates the dir
# with mode 700, so a bare `go+r` would leave the dir
# un-traversable for agent and config.py would
# PermissionError on `Path('/configs/config.yaml').exists()`.
# Mount RW (not :ro) so the entrypoint's `chown -R agent
# /configs` succeeds — its silent chown failure on a :ro
# mount was the original symptom.
SMOKE_CONFIG_DIR=$(mktemp -d)
cp config.yaml "${SMOKE_CONFIG_DIR}/"
chmod -R a+rX "${SMOKE_CONFIG_DIR}"
# Stub credentials — adapters validate shape at create_executor
# time but the smoke times out before any real call goes out.
# Set the common ones so any adapter that early-validates a
# specific key sees a non-empty value.
# PYTHONPATH=/app mirrors what the platform's provisioner
# injects at workspace startup (workspace-server/internal/
# provisioner/provisioner.go:563). Without it,
# `importlib.import_module('adapter')` in the runtime's
# preflight check fails with ModuleNotFoundError because
# molecule-runtime is a console_scripts entry point —
# sys.path[0] is /usr/local/bin, NOT /app. The existing
# static import smoke step above doesn't hit this because
# `python3 -c "import $mod"` adds cwd to sys.path; only the
# entry-point invocation needs PYTHONPATH.
set +e
# MOLECULE_SMOKE_TIMEOUT_SECS=90 is calibrated to outlast
# claude-agent-sdk's 60s initialize() handshake (see step
# comment above + workspace/smoke_mode.py top docstring) so
# adapter wedge catch arms run before run_executor_smoke
# gives up. Outer `timeout 120` is the runner-level safety
# net — slightly longer than the inner timeout so a hung
# smoke_mode itself surfaces as exit 124 and gets a clear
# error message instead of just `exit 1`.
timeout 120 docker run --rm \
-v "${SMOKE_CONFIG_DIR}:/configs" \
-e WORKSPACE_ID=fake-smoke \
-e PYTHONPATH=/app \
-e MOLECULE_SMOKE_MODE=1 \
-e MOLECULE_SMOKE_TIMEOUT_SECS=90 \
-e CLAUDE_CODE_OAUTH_TOKEN=sk-fake-smoke-token \
-e ANTHROPIC_API_KEY=sk-fake-smoke-key \
-e GEMINI_API_KEY=fake-smoke-key \
-e OPENAI_API_KEY=sk-fake-smoke-key \
"${IMAGE}"
rc=$?
set -e
# Cleanup is best-effort: the entrypoint chowns /configs to
# uid 1000 (agent) inside the container, which propagates to
# the host bind-mount, leaving the runner user unable to
# remove the files. Fall back to `sudo rm` and ignore any
# remaining failure — the runner is ephemeral, /tmp is
# cleaned automatically post-job.
rm -rf "${SMOKE_CONFIG_DIR}" 2>/dev/null \
|| sudo rm -rf "${SMOKE_CONFIG_DIR}" 2>/dev/null \
|| true
if [ "${rc}" -eq 124 ]; then
echo "::error::boot smoke wedged past 120s — smoke_mode itself failed to terminate (look for blocking calls before MOLECULE_SMOKE_TIMEOUT_SECS fires)"
exit 1
fi
if [ "${rc}" -ne 0 ]; then
echo "::error::boot smoke failed (exit ${rc}) — executor.execute() raised an import error OR an adapter marked runtime_wedge.is_wedged() (PR-25-class init wedge). Check the container log above for the offending lazy import or wedge reason."
exit "${rc}"
fi
echo "::notice::✓ ${IMAGE} executor.execute() smoke passed (imports healthy, no runtime wedge)"
- name: Push image to GHCR (post-smoke)
# Now that the smoke test passed, push both tags. build-push-action
# reuses the cached build from the load step above, so this is fast

View File

@ -15,10 +15,10 @@ jobs:
# 5 org-template repos as the validator evolved. Single source of
# truth eliminates that drift class entirely. Mirrors the same
# pattern already used by validate-workspace-template.yml.
- uses: actions/checkout@v4
with:
repository: Molecule-AI/molecule-ci
path: .molecule-ci-canonical
# Direct git-clone — see validate-plugin.yml for the rationale.
# Anonymous fetch of public molecule-ci, no actions/checkout idiosyncrasies.
- name: Fetch molecule-ci canonical scripts
run: git clone --depth 1 https://git.moleculesai.app/molecule-ai/molecule-ci.git .molecule-ci-canonical
- uses: actions/setup-python@v5
with:
python-version: "3.11"

View File

@ -15,10 +15,19 @@ jobs:
# 20+ plugin repos as the validator evolved. Single source of
# truth eliminates that drift class entirely. Mirrors the same
# pattern already used by validate-workspace-template.yml.
- uses: actions/checkout@v4
with:
repository: Molecule-AI/molecule-ci
path: .molecule-ci-canonical
# Direct git-clone instead of actions/checkout@v4 because:
# (a) actions/checkout@v4 sends Authorization: basic <github.token> by default,
# and Gitea 404s the cross-repo authenticated request (different from
# GitHub which falls back to anon-public-read).
# (b) Passing token: '' triggers actions/checkout's runtime "Input required
# and not supplied: token" error — the input is documented as
# required:false but the action's runtime calls getInput with
# required:true on its auth-helper path.
# Anonymous git clone of public molecule-ci has neither problem.
# See molecule-ci#1 (lowercase fix) + #2 (token:'' attempt) +
# the post-merge CI run on plugin-molecule-careful-bash@663bf72.
- name: Fetch molecule-ci canonical scripts
run: git clone --depth 1 https://git.moleculesai.app/molecule-ai/molecule-ci.git .molecule-ci-canonical
- uses: actions/setup-python@v5
with:
python-version: "3.11"

View File

@ -54,10 +54,10 @@ jobs:
# template repos as the validator evolved. Single source of truth
# eliminates that drift class entirely — every template runs the
# same canonical contract check on every CI run.
- uses: actions/checkout@v4
with:
repository: Molecule-AI/molecule-ci
path: .molecule-ci-canonical
# Direct git-clone — see validate-plugin.yml for the rationale.
# Anonymous fetch of public molecule-ci, no actions/checkout idiosyncrasies.
- name: Fetch molecule-ci canonical scripts
run: git clone --depth 1 https://git.moleculesai.app/molecule-ai/molecule-ci.git .molecule-ci-canonical
- uses: actions/setup-python@v5
with:
python-version: "3.11"
@ -133,10 +133,10 @@ jobs:
if: github.event.pull_request.head.repo.fork != true
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v4
with:
repository: Molecule-AI/molecule-ci
path: .molecule-ci-canonical
# Direct git-clone — see validate-plugin.yml for the rationale.
# Anonymous fetch of public molecule-ci, no actions/checkout idiosyncrasies.
- name: Fetch molecule-ci canonical scripts
run: git clone --depth 1 https://git.moleculesai.app/molecule-ai/molecule-ci.git .molecule-ci-canonical
- uses: actions/setup-python@v5
with:
python-version: "3.11"
@ -164,3 +164,47 @@ jobs:
- name: Docker build smoke test
if: hashFiles('Dockerfile') != ''
run: docker build -t template-test . --no-cache 2>&1 | tail -5 && echo "✓ Docker build succeeded"
# Aggregator that emits a single `Template validation` check name —
# the caller's job (`validate:` in each template's ci.yml) plus this
# job's name produces `validate / Template validation`, which is what
# template-repo branch protection has historically required.
#
# Why it's needed: the workflow was refactored from one job into
# validate-static + validate-runtime (with matrix-suffixed display
# names) for fork-PR security. The matrix names never match the
# original required-check name, so PR auto-merge silently hung in
# BLOCKED forever on every template repo (caught while shipping
# fixes for the boot-smoke gate, openclaw#11 + hermes#29).
#
# `if: always()` so it reports out even when validate-static fails —
# without that, GitHub marks the aggregator as SKIPPED and branch
# protection still blocks because the required check never reports
# a final state.
#
# Fork-PR semantics: validate-runtime is intentionally skipped on
# fork PRs (security gate). Treat `skipped` as a pass for the
# aggregator on forks so static-only coverage doesn't make every
# external PR un-mergeable.
template-validation:
name: Template validation
runs-on: ubuntu-latest
needs: [validate-static, validate-runtime]
if: always()
timeout-minutes: 1
steps:
- name: Aggregate
run: |
static="${{ needs.validate-static.result }}"
runtime="${{ needs.validate-runtime.result }}"
echo "validate-static: $static"
echo "validate-runtime: $runtime"
if [ "$static" != "success" ]; then
echo "::error::validate-static did not succeed: $static"
exit 1
fi
if [ "$runtime" != "success" ] && [ "$runtime" != "skipped" ]; then
echo "::error::validate-runtime did not succeed: $runtime"
exit 1
fi
echo "::notice::Template validation aggregate passed (static=$static, runtime=$runtime)"

View File

@ -2,19 +2,47 @@
"""Validate a Molecule AI org template repo."""
import os, sys, yaml
# Support !include and other custom YAML tags used by org templates.
# These resolve at platform load time, not at validation time — we just
# need to parse past them without crashing.
# Support custom YAML tags used by org templates. Two shapes:
#
# - `!include teams/pm.yaml` → scalar string referencing another YAML
# file in the same repo. Platform inlines at load time.
#
# - `!external\n repo: ...\n ref: ...\n path: ...` → mapping
# referencing a workspace tree to fetch from another repo. Platform
# fetches into a content-addressable cache at load time
# (internal#77 / molecule-core#105).
#
# Both shapes resolve at platform load time, not at validation time.
# The validator treats them as opaque references — it does NOT chase
# them down. We mark each parsed value with a sentinel subtype so the
# `validate_workspace` walk knows to skip them rather than tripping
# the "missing 'name'" branch.
class IncludeRef(str):
"""`!include path/to.yaml` — opaque reference, skipped by validator."""
class ExternalRef(dict):
"""`!external` mapping — opaque reference, skipped by validator."""
class PermissiveLoader(yaml.SafeLoader):
pass
def _include_constructor(loader, node):
return IncludeRef(loader.construct_scalar(node))
def _external_constructor(loader, node):
return ExternalRef(loader.construct_mapping(node))
def _generic_constructor(loader, tag_suffix, node):
# Fallback for unknown tags. Preserve the parsed shape so legacy
# docs that lean on tags we have not modeled yet still parse.
if isinstance(node, yaml.MappingNode):
return loader.construct_mapping(node)
if isinstance(node, yaml.SequenceNode):
return loader.construct_sequence(node)
return loader.construct_scalar(node)
PermissiveLoader.add_constructor("!include", _include_constructor)
PermissiveLoader.add_constructor("!external", _external_constructor)
PermissiveLoader.add_multi_constructor("!", _generic_constructor)
errors = []
@ -33,7 +61,13 @@ if not org.get("workspaces") and not org.get("defaults"):
errors.append("org.yaml must have at least 'workspaces' or 'defaults'")
def validate_workspace(ws, path=""):
# !include tags resolve to strings at parse time; skip non-dicts
# `!include path/to.yaml` parses as IncludeRef (str subclass).
# `!external {repo, ref, path}` parses as ExternalRef (dict subclass).
# Both are opaque references — skip without chasing.
if isinstance(ws, (IncludeRef, ExternalRef)):
return []
# Legacy unknown-tag scalars (handled by _generic_constructor) stay
# as plain strings; they are not workspace dicts either.
if not isinstance(ws, dict):
return []
ws_errors = []
@ -59,6 +93,11 @@ if errors:
def count_ws(nodes):
c = 0
for n in nodes:
# Skip opaque references — we do not know how many workspaces
# they expand to without resolving them, and resolution is the
# platform's job, not the validator's.
if isinstance(n, (IncludeRef, ExternalRef)):
continue
if not isinstance(n, dict):
continue
c += 1
@ -66,4 +105,4 @@ def count_ws(nodes):
return c
total = count_ws(org.get("workspaces", []))
print(f"✓ org.yaml valid: {org['name']} ({total} workspaces)")
print(f"✓ org.yaml valid: {org['name']} ({total} direct workspaces; external refs not counted)")