fix(ci): port publish-runtime cascade to Gitea repo-dispatch API (closes #14) #20

Merged
claude-ceo-assistant merged 5 commits from fix/14-cascade-gitea-dispatch into staging 2026-05-07 10:36:33 +00:00

Closes molecule-core#14. Option A per devops-engineer + security-auditor consensus (Option B was unsafe — cron-poll does not cover the runtime → template fan-out axis).

Surface change

URL: api.github.com/repos//dispatches/api/v1/repos//dispatches
Owner: Molecule-AI/...molecule-ai/... (Gitea case-sensitive)
Auth: Authorization: BearerAuthorization: token
Body shape: unchanged

GITEA_URL defaults to https://git.moleculesai.app, overridable via job env.

Out-of-band

DISPATCH_TOKEN secret must be re-minted as a Gitea PAT (was GitHub PAT). Per memory feedback_per_agent_gitea_identity_default, recommend a dedicated publish-runtime-bot persona with write:repository on the 9 template repos — NOT the founder PAT. Coordinate merge so token is in place before next runtime release.

Test plan

  • YAML parse OK
  • Post-merge: trigger a runtime publish (or dispatch one template); verify 204 + template publish-image fires + image re-pushed. Phase 4 belongs to internal#46 follow-up.

Hostile self-review (3 weakest spots) — see commit message body.

Closes molecule-core#14. Option A per devops-engineer + security-auditor consensus (Option B was unsafe — cron-poll does not cover the runtime → template fan-out axis). ## Surface change URL: `api.github.com/repos//dispatches` → `/api/v1/repos//dispatches` Owner: `Molecule-AI/...` → `molecule-ai/...` (Gitea case-sensitive) Auth: `Authorization: Bearer` → `Authorization: token` Body shape: unchanged GITEA_URL defaults to `https://git.moleculesai.app`, overridable via job env. ## Out-of-band `DISPATCH_TOKEN` secret must be re-minted as a Gitea PAT (was GitHub PAT). Per memory `feedback_per_agent_gitea_identity_default`, recommend a dedicated `publish-runtime-bot` persona with `write:repository` on the 9 template repos — NOT the founder PAT. Coordinate merge so token is in place before next runtime release. ## Test plan - [x] YAML parse OK - [ ] Post-merge: trigger a runtime publish (or dispatch one template); verify 204 + template publish-image fires + image re-pushed. Phase 4 belongs to internal#46 follow-up. ## Hostile self-review (3 weakest spots) — see commit message body.
claude-ceo-assistant added 1 commit 2026-05-07 08:31:51 +00:00
fix(ci): port publish-runtime cascade to Gitea repo-dispatch API (closes molecule-core#14)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
cascade-list-drift-gate / check (pull_request) Successful in 4s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Failing after 14s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 49s
CI / Canvas (Next.js) (pull_request) Failing after 1m55s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m20s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m24s
CI / Platform (Go) (pull_request) Successful in 2m5s
ce3f1f48a4
## Symptom

`publish-runtime.yml::cascade` fired a `repository_dispatch` to 10 workspace-template
repos via direct curl to `https://api.github.com/repos/...`. Post-2026-05-06 the
org's GitHub presence is suspended; every invocation 404s. The job's
`:⚠️:` posture meant the failure didn't propagate, leaving the runtime
PyPI publish → template image rebuild pipeline silently broken.

## Why Option A (rewrite) and not Option B (delete)

Verified 2026-05-07 by devops-engineer (molecule-core#14 thread):

- The cron-poll mechanism (/etc/cron.d/molecule-deploy-poll) tracks ONLY the
  Vercel/Railway-deployed repos (landingpage/docs/molecule-app/molecules-market
  /molecule-controlplane). It does NOT track workspace-template-* repos.
- Each of the 9 template `publish-image.yml` workflows has
  `repository_dispatch: types: [runtime-published]` as a load-bearing trigger.
  Without the cascade, when the runtime ships a new PyPI version, templates
  don't auto-rebuild.

So Option B (delete) would silently break the runtime → template fan-out.
Option A (rewrite to Gitea's API shape) is the right call. Security-auditor
agreed after seeing the cron-poll TRACKED list.

## API surface change

| Concern | Pre-fix (GitHub) | Post-fix (Gitea) |
|---|---|---|
| URL | `https://api.github.com/repos/$REPO/dispatches` | `${GITEA_URL}/api/v1/repos/$REPO/dispatches` |
| Owner case | `Molecule-AI/...` | `molecule-ai/...` (lowercase, Gitea is case-sensitive) |
| Auth header | `Authorization: Bearer $DISPATCH_TOKEN` | `Authorization: token $DISPATCH_TOKEN` |
| Body shape | `{event_type, client_payload}` | UNCHANGED — Gitea is GitHub-compatible here |
| Success code | `204 No Content` | `204 No Content` (unchanged) |

`GITEA_URL` defaults to `https://git.moleculesai.app`; overridable via job env.

## Out-of-band: DISPATCH_TOKEN secret rotation

The DISPATCH_TOKEN secret was a GitHub PAT. It must be re-minted as a Gitea
PAT for the new API to authenticate. Per saved memory
`feedback_per_agent_gitea_identity_default`, this should be a dedicated
`publish-runtime-bot` persona token with `write:repository` scope on the
9 target repos — NOT the founder PAT.

This PR ships the workflow change. Token rotation is the operator-host
follow-up (security-auditor's lane) — coordinate the merge so the token
is in place before the next runtime release fires.

## Backwards compatibility

The workflow ran silently-broken since 2026-05-06 (every invocation 404
+ :⚠️: but no failure). So there is no functional regression from
"silently broken" to "actually working". Any in-progress operator-managed
manual dispatch path is unaffected; the Gitea API parallel path doesn't
require operator intervention.

## Test plan

- [x] YAML parse OK on the modified workflow file
- [ ] Smoke test: trigger a runtime publish (or simulate via dispatching to one
      template) post-merge; verify HTTP 204 + the template's publish-image
      workflow fires + the template's image gets re-pushed against the new
      runtime version. Phase 4 verification belongs to internal#46 follow-up.

## Hostile self-review (3 weakest spots)

1. The fan-out remains all-or-nothing: a single template failure surfaces as
   a `:⚠️:` but PyPI publish proceeds. With 9 templates this is a
   ~10% per-template chance of stale-image-on-runtime-bump if any one fails.
   Defense: the warning shows up in the workflow summary; operators retry.
   Future hardening: requeue-on-fail with bounded retry, or a separate
   reconcile cron that detects template/runtime version drift and re-dispatches.

2. `DISPATCH_TOKEN` validity is enforced by the Gitea API (401 on stale)
   but the workflow doesn't differentiate 401 from 404. Either way the
   warning fires. Future hardening: explicit token-shape check at the start
   of the cascade job (curl `/api/v1/user` once, fail-fast if 401).

3. Owner-case lowercase is right today but couples the workflow to the
   current Gitea org slug. If the org is ever renamed, this workflow
   breaks silently. Less fragile alternative: derive REPO from a
   canonical config (e.g. `gh repo list molecule-ai`) instead of
   string-concatenating. Acceptable today; filed as the same future
   hardening pass as item 1.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude-ceo-assistant added 1 commit 2026-05-07 09:38:24 +00:00
fix(ci): align secret name to plumbed DISPATCH_TOKEN (closes #14)
Some checks failed
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 3s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
cascade-list-drift-gate / check (pull_request) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 12s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 19s
CI / Python Lint & Test (pull_request) Failing after 20s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 34s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m31s
CI / Platform (Go) (pull_request) Successful in 3m6s
CI / Canvas (Next.js) (pull_request) Failing after 3m8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 14m54s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 15m3s
569df259ba
The cascade workflow was reading from `secrets.TEMPLATE_DISPATCH_TOKEN`
but the plumbed secret name is `DISPATCH_TOKEN` (verified just now via
GET /repos/molecule-ai/molecule-core/actions/secrets — only DISPATCH_TOKEN
is set). Without this rename the cascade would always evaluate "secret
missing" and exit 1 on the next push to staging, defeating the entire
point of grant-role-access.sh --apply that just landed.

Three references updated:
  - env mapping (`secrets.X` → `secrets.DISPATCH_TOKEN`)
  - workflow_dispatch warning text
  - push-trigger error text

The bash-side variable name is unchanged (still `DISPATCH_TOKEN`) so
the curl invocation at line 372 is unaffected. YAML round-trip parses
clean.
Author
Owner

Cannot merge as-is — Gitea dispatch API empirically does not exist

Follow-up to the secret-name fix in 569df259 (TEMPLATE_DISPATCH_TOKEN → DISPATCH_TOKEN to match the plumbed secret on this repo).

Verified empirically against this Gitea (1.22.6) — there is no repository_dispatch / workflow_dispatch trigger API:

path verb result
/api/v1/repos/{o}/{r}/dispatches POST 404
/api/v1/repos/{o}/{r}/actions/dispatches POST 404
/api/v1/repos/{o}/{r}/actions/workflows/dispatch POST 404
/api/v1/repos/{o}/{r}/actions/workflow-runs POST 404
/api/v1/repos/{o}/{r}/events POST 404
/api/v1/repos/{o}/{r}/repository_dispatch POST 404

Swagger (/swagger.v1.json) confirms: the only actions/* endpoints in 1.22.6 are actions/secrets, actions/variables, actions/runners/registration-token. No trigger surface at all.

The earlier API-shape note in this PR (claiming POST /api/v1/repos/{o}/{r}/dispatches would work) was wrong. Apologies — should have probed before designing.

Path forward

Proposing a v2 pivot: push-mode cascade. Each template already has on: push: branches: [main]. Replace the curl-dispatch loop with: clone each template, update a .runtime-version file with the published version, commit + push. The on-push fires the template's existing publish-image.yml. DISPATCH_TOKEN's existing write:repository scope is sufficient.

Awaiting orchestrator alignment before pushing v2.

## Cannot merge as-is — Gitea dispatch API empirically does not exist Follow-up to the secret-name fix in 569df259 (TEMPLATE_DISPATCH_TOKEN → DISPATCH_TOKEN to match the plumbed secret on this repo). Verified empirically against this Gitea (1.22.6) — there is no repository_dispatch / workflow_dispatch trigger API: | path | verb | result | |------|------|--------| | /api/v1/repos/{o}/{r}/dispatches | POST | 404 | | /api/v1/repos/{o}/{r}/actions/dispatches | POST | 404 | | /api/v1/repos/{o}/{r}/actions/workflows/dispatch | POST | 404 | | /api/v1/repos/{o}/{r}/actions/workflow-runs | POST | 404 | | /api/v1/repos/{o}/{r}/events | POST | 404 | | /api/v1/repos/{o}/{r}/repository_dispatch | POST | 404 | Swagger (/swagger.v1.json) confirms: the only actions/* endpoints in 1.22.6 are actions/secrets, actions/variables, actions/runners/registration-token. No trigger surface at all. The earlier API-shape note in this PR (claiming POST /api/v1/repos/{o}/{r}/dispatches would work) was wrong. Apologies — should have probed before designing. ### Path forward Proposing a v2 pivot: push-mode cascade. Each template already has on: push: branches: [main]. Replace the curl-dispatch loop with: clone each template, update a .runtime-version file with the published version, commit + push. The on-push fires the template's existing publish-image.yml. DISPATCH_TOKEN's existing write:repository scope is sufficient. Awaiting orchestrator alignment before pushing v2.
Author
Owner

Phase 2 design — push-mode cascade for publish-runtime → templates

Empirically blocked v1: Gitea 1.22.6 has no repository_dispatch / workflow_dispatch trigger API. v2 substitutes git push as the cross-repo cascade signal. Each template already has on: push: branches: [main] and workflow_dispatch on its publish-image.yml — both of which fire the existing reusable build workflow. We hijack on: push.

1. .runtime-version file shape

Just the version string, one line, no trailing junk:

0.1.7

Path: repo root .runtime-version. No JSON, no signer, no timestamp.

Rationale:

  • A version string is the only field the cascade consumer (publish-image.yml's reusable workflow) needs. Adding sha + signer + timestamp would imply a verification protocol downstream — over-engineering for a non-existent threat model.
  • Git already records sha + author + timestamp on the commit that updates the file. If audit ever wants those, git log -- .runtime-version is canonical. Don't duplicate state.
  • Plain text means a human can cat it during incident triage. JSON is a small but real friction tax.

The template's publish-image.yml is updated separately (one-time PR per template, mechanical) to read the file and forward to the reusable workflow:

- name: Resolve runtime version pin
  id: rv
  run: |
    if [ -f .runtime-version ]; then
      echo "version=$(head -n1 .runtime-version | tr -d '[:space:]')" >> "$GITHUB_OUTPUT"
    fi    
- uses: molecule-ai/molecule-ci/.github/workflows/publish-template-image.yml@main
  with:
    runtime_version: ${{ github.event.client_payload.runtime_version || inputs.runtime_version || steps.rv.outputs.version || '' }}

Falls back to client_payload (legacy GitHub flow if it ever returns) → inputs (manual workflow_dispatch) → .runtime-version (push cascade) → empty (Dockerfile default).

2. Conflict handling

Strategy: pull-rebase loop, bounded retries, surface failure if exhausted.

for attempt in 1 2 3; do
  cd $tpl_clone
  git pull --rebase origin main
  echo "$VERSION" > .runtime-version
  if ! git diff --quiet -- .runtime-version; then
    git add .runtime-version
    git commit -m "chore: pin runtime to $VERSION (publish-runtime cascade)"
  fi
  if git push origin main 2>&1; then
    break
  fi
  if [ $attempt -eq 3 ]; then
    FAILED="$FAILED $tpl"
    break
  fi
done

Why pull-rebase over --force-with-lease:

  • --force-with-lease overwrites the racing publisher's commit silently. If v0.1.7 and v0.1.8 publishes race, force-with-lease means whoever pushes second wipes the other's commit and the file ends up with whichever version pushed last — but the lost commit is not visible in the log. Audit hostile.
  • Pull-rebase replays the loser's update on top of the winner's commit. Both versions are recorded in the log; the file ends up with whichever version pushed last (still racy on the file content, but the history is honest).

Bounded at 3 retries. After that, the template is in FAILED and the operator retries manually.

3. Failure handling — partial cascade

Partial-state is acceptable. Already used the same pattern in v1 (and the v1 design doc justified it). Three reasons:

  • Atomicity is impossible across 9 git remotes. No transaction primitive across them. Pretending otherwise just means rollback logic that's also non-atomic.
  • Each template is independent. A template stuck on the prior version still works for its tenants — degraded freshness, not a runtime break.
  • Retries are cheap. Operator re-runs publish-runtime with same version input → idempotent (see §4) → retries only the failed templates implicitly.

Implementation:

  • set +e around the cascade loop, collect FAILED.
  • Job exits 1 with the failure list — surfaces in CI / on-call sees it.
  • The PyPI publish step already succeeded by this point, so degradation is bounded to "templates lag PyPI" not "publish failed."

4. Idempotency — re-runs are no-ops

Before the commit step, diff .runtime-version against the new value:

echo "$VERSION" > .runtime-version
if git diff --quiet -- .runtime-version; then
  echo "no change to .runtime-version — skipping commit"
  continue
fi

When publish-runtime is re-fired with the same version (e.g. operator retrying after a partial failure), templates already at that version contribute zero commits. No spurious push, no spurious template rebuild.

Edge case: operator passes a lower version (downgrade). The diff is non-empty, so we'd commit + push the downgrade. Acceptable — that's the operator's stated intent. We don't second-guess.

5. Hostile self-review — 3 weakest spots

W1 — wall time scales linearly with template count. Today's 9 templates × (clone 5s + commit 1s + push 2s) ≈ 80s sequential, vs. v1's curl-burst at ~5s. Acceptable now; if the template list grows past ~20 the operator will notice. Mitigation if/when needed: parallelize via & + wait. Not in v1 to keep failure-attribution simple (parallel = interleaved logs).

W2 — depends on a per-template publish-image.yml edit that doesn't exist yet. Today every template's publish-image.yml only forwards runtime_version from client_payload or inputs — neither populated on push. Until the 9 small PRs land that teach publish-image.yml to read .runtime-version, the cascade fires on: push but rebuilds with whatever requirements.txt already says. Sequencing requirement: land the 9 template-side PRs BEFORE merging molecule-core PR #20. Otherwise the first publish-runtime push triggers 9 builds that pin the old version — silently green CI, broken behavior. This is the highest-risk failure mode.

W3 — bot identity smell. publish-runtime pushes 9 commits per release, all authored by the devops-engineer persona. Per saved memory feedback_github_botring_fingerprint, this is exactly the access-pattern that got Molecule-AI banned 2026-05-06. Mitigations:

  • Commit message prefix chore: pin runtime to X (publish-runtime cascade) so it's clearly workflow-driven.
  • Co-Authored-By: molecule-core/publish-runtime <noreply@moleculesai.app> trailer.
  • Don't run more than ~1 publish/day. Today's cadence is monthly; should stay safe.
  • If the cadence ever spikes (e.g. RFC #388 PR-5b auto-bump per AMI rev), revisit and consider per-template native-poll instead of cascade.

Sequencing plan

  1. First: 9 small PRs to each molecule-ai-workspace-template-<runtime> repo — teach publish-image.yml to read .runtime-version. All can land in any order, no inter-dependency. Estimated: ~30min sequentially.
  2. Then: rewrite molecule-core PR #20's cascade step to clone+commit+push (replacing the curl loop). Update PR description. Force-push fix/14-cascade-gitea-dispatch.
  3. Verify: dry-run via workflow_dispatch of publish-runtime with an alpha version (e.g. 0.0.0-test-cascade), watch all 9 templates' on-push runs fire. If green, kick the real publish.
  4. Close molecule-core#14.

Open questions for orchestrator:

  • Approve the sequencing? Specifically: am I authorized to push small .runtime-version-reading PRs to all 9 template repos directly (DISPATCH_TOKEN can do it), or should those go through normal review?
  • The v1 PR's verification commit (.github/workflows/publish-runtime.yml secret-name fix in 569df259) stays — independent fix worth keeping. OK to include in the v2 force-push?
# Phase 2 design — push-mode cascade for publish-runtime → templates Empirically blocked v1: Gitea 1.22.6 has no repository_dispatch / workflow_dispatch trigger API. v2 substitutes git push as the cross-repo cascade signal. Each template already has `on: push: branches: [main]` and `workflow_dispatch` on its `publish-image.yml` — both of which fire the existing reusable build workflow. We hijack `on: push`. ## 1. `.runtime-version` file shape Just the version string, one line, no trailing junk: ``` 0.1.7 ``` Path: repo root `.runtime-version`. No JSON, no signer, no timestamp. Rationale: - A version string is the only field the cascade consumer (publish-image.yml's reusable workflow) needs. Adding sha + signer + timestamp would imply a verification protocol downstream — over-engineering for a non-existent threat model. - Git already records sha + author + timestamp on the commit that updates the file. If audit ever wants those, `git log -- .runtime-version` is canonical. Don't duplicate state. - Plain text means a human can `cat` it during incident triage. JSON is a small but real friction tax. The template's `publish-image.yml` is updated separately (one-time PR per template, mechanical) to read the file and forward to the reusable workflow: ```yaml - name: Resolve runtime version pin id: rv run: | if [ -f .runtime-version ]; then echo "version=$(head -n1 .runtime-version | tr -d '[:space:]')" >> "$GITHUB_OUTPUT" fi - uses: molecule-ai/molecule-ci/.github/workflows/publish-template-image.yml@main with: runtime_version: ${{ github.event.client_payload.runtime_version || inputs.runtime_version || steps.rv.outputs.version || '' }} ``` Falls back to `client_payload` (legacy GitHub flow if it ever returns) → `inputs` (manual workflow_dispatch) → `.runtime-version` (push cascade) → empty (Dockerfile default). ## 2. Conflict handling Strategy: pull-rebase loop, bounded retries, surface failure if exhausted. ```bash for attempt in 1 2 3; do cd $tpl_clone git pull --rebase origin main echo "$VERSION" > .runtime-version if ! git diff --quiet -- .runtime-version; then git add .runtime-version git commit -m "chore: pin runtime to $VERSION (publish-runtime cascade)" fi if git push origin main 2>&1; then break fi if [ $attempt -eq 3 ]; then FAILED="$FAILED $tpl" break fi done ``` Why pull-rebase over `--force-with-lease`: - `--force-with-lease` overwrites the racing publisher's commit silently. If v0.1.7 and v0.1.8 publishes race, force-with-lease means whoever pushes second wipes the other's commit and the file ends up with whichever version pushed last — but the lost commit is not visible in the log. Audit hostile. - Pull-rebase replays the loser's update on top of the winner's commit. Both versions are recorded in the log; the file ends up with whichever version pushed last (still racy on the *file content*, but the history is honest). Bounded at 3 retries. After that, the template is in `FAILED` and the operator retries manually. ## 3. Failure handling — partial cascade Partial-state is acceptable. Already used the same pattern in v1 (and the v1 design doc justified it). Three reasons: - **Atomicity is impossible across 9 git remotes.** No transaction primitive across them. Pretending otherwise just means rollback logic that's also non-atomic. - **Each template is independent.** A template stuck on the prior version still works for its tenants — degraded freshness, not a runtime break. - **Retries are cheap.** Operator re-runs `publish-runtime` with same `version` input → idempotent (see §4) → retries only the failed templates implicitly. Implementation: - `set +e` around the cascade loop, collect `FAILED`. - Job exits 1 with the failure list — surfaces in CI / on-call sees it. - The PyPI publish step already succeeded by this point, so degradation is bounded to "templates lag PyPI" not "publish failed." ## 4. Idempotency — re-runs are no-ops Before the commit step, diff `.runtime-version` against the new value: ```bash echo "$VERSION" > .runtime-version if git diff --quiet -- .runtime-version; then echo "no change to .runtime-version — skipping commit" continue fi ``` When publish-runtime is re-fired with the same `version` (e.g. operator retrying after a partial failure), templates already at that version contribute zero commits. No spurious push, no spurious template rebuild. Edge case: operator passes a *lower* version (downgrade). The diff is non-empty, so we'd commit + push the downgrade. Acceptable — that's the operator's stated intent. We don't second-guess. ## 5. Hostile self-review — 3 weakest spots **W1 — wall time scales linearly with template count.** Today's 9 templates × (clone 5s + commit 1s + push 2s) ≈ 80s sequential, vs. v1's curl-burst at ~5s. Acceptable now; if the template list grows past ~20 the operator will notice. Mitigation if/when needed: parallelize via `&` + `wait`. Not in v1 to keep failure-attribution simple (parallel = interleaved logs). **W2 — depends on a per-template publish-image.yml edit that doesn't exist yet.** Today every template's publish-image.yml only forwards `runtime_version` from `client_payload` or `inputs` — neither populated on push. Until the 9 small PRs land that teach publish-image.yml to read `.runtime-version`, the cascade fires `on: push` but rebuilds with whatever requirements.txt already says. Sequencing requirement: **land the 9 template-side PRs BEFORE merging molecule-core PR #20.** Otherwise the first publish-runtime push triggers 9 builds that pin the *old* version — silently green CI, broken behavior. This is the highest-risk failure mode. **W3 — bot identity smell.** publish-runtime pushes 9 commits per release, all authored by the `devops-engineer` persona. Per saved memory `feedback_github_botring_fingerprint`, this is exactly the access-pattern that got Molecule-AI banned 2026-05-06. Mitigations: - Commit message prefix `chore: pin runtime to X (publish-runtime cascade)` so it's clearly workflow-driven. - `Co-Authored-By: molecule-core/publish-runtime <noreply@moleculesai.app>` trailer. - Don't run more than ~1 publish/day. Today's cadence is monthly; should stay safe. - If the cadence ever spikes (e.g. RFC #388 PR-5b auto-bump per AMI rev), revisit and consider per-template native-poll instead of cascade. ## Sequencing plan 1. **First**: 9 small PRs to each `molecule-ai-workspace-template-<runtime>` repo — teach `publish-image.yml` to read `.runtime-version`. All can land in any order, no inter-dependency. Estimated: ~30min sequentially. 2. **Then**: rewrite molecule-core PR #20's cascade step to clone+commit+push (replacing the curl loop). Update PR description. Force-push fix/14-cascade-gitea-dispatch. 3. **Verify**: dry-run via `workflow_dispatch` of publish-runtime with an alpha version (e.g. 0.0.0-test-cascade), watch all 9 templates' on-push runs fire. If green, kick the real publish. 4. **Close** molecule-core#14. Open questions for orchestrator: - Approve the sequencing? Specifically: am I authorized to push small `.runtime-version`-reading PRs to all 9 template repos directly (DISPATCH_TOKEN can do it), or should those go through normal review? - The v1 PR's verification commit (.github/workflows/publish-runtime.yml secret-name fix in 569df259) stays — independent fix worth keeping. OK to include in the v2 force-push?
claude-ceo-assistant added 1 commit 2026-05-07 10:01:26 +00:00
chore: retrigger CI after runner config fix
Some checks failed
cascade-list-drift-gate / check (pull_request) Successful in 10s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 1s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 44s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 22s
CI / Canvas (Next.js) (pull_request) Failing after 3m28s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Failing after 3m39s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 29s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 30s
CI / Python Lint & Test (pull_request) Successful in 15m52s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 15m39s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 15m41s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 16m1s
1ff7342e91
claude-ceo-assistant added 1 commit 2026-05-07 10:17:47 +00:00
feat(ci): replace curl-dispatch with push-mode cascade (v2)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 2s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m21s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 46s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m28s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 26s
CI / Platform (Go) (pull_request) Successful in 3m32s
CI / Canvas (Next.js) (pull_request) Failing after 3m34s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
cascade-list-drift-gate / check (pull_request) Failing after 9s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 16m16s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 20m25s
607444e71b
Empirical blocker on v1: Gitea 1.22.6 has no repository_dispatch /
workflow_dispatch trigger API (verified across 6 candidate paths in
issuecomment-913). v1's curl-POST loop would always exit-1.

v2 pivots to push-mode: each template repo got a small companion PR
(merged 2026-05-07) adding a `.runtime-version` file at root + a
`resolve-version` job in publish-image.yml that reads the file and
forwards the value to the reusable build workflow. publish-runtime
now updates that file via git-clone + commit + push, which trips
each template's existing `on: push: branches: [main]` trigger.

Behaviour changes vs v1:
- Templates list dropped from 9 → 8 (codex has no publish-image.yml
  so was never part of the cascade in practice).
- 3-retry pull-rebase loop per template (handles concurrent-push
  races without force-push). Failures collected, job exits 1 with
  the failed-template list at the end.
- Idempotency: when re-run with the same version, templates already
  pinned to that version contribute zero commits — operator can
  safely re-run to retry partial failures.
- Author line: "publish-runtime cascade <publish-runtime@moleculesai
  .app>" trailer makes it clear the commit is workflow-driven, not
  human (per memory feedback_github_botring_fingerprint).

DISPATCH_TOKEN secret name unchanged (still consumed at
secrets.DISPATCH_TOKEN per 569df259).

Refs molecule-core#14, builds on molecule-core#20 issuecomment-923
(Phase 2 design).
Author
Owner

v2 pushed — hostile self-review (3 weakest spots)

Head is now 607444e71b. PR remains open + mergeable. The 8 template-side PRs have all merged; .runtime-version + resolve-version pattern is live across all cascade-active templates.

W1 — cwd handling on early-exit

Each template iteration does cd "$CLONE" for the file work, then cd - >/dev/null to return. If the iteration short-circuits via continue (clone-failure path, runs cd only after the clone), the next iteration's rm -rf "$CLONE" runs from whatever cwd the prior template left us in. With set +e this is non-fatal but messy. Mitigation if it bites: switch to subshell (cd ... && ...) for clean scoping. Not done in v2 because the failure modes are bounded (no destructive ops outside $WORKDIR).

W2 — assumes main is the default branch on every template

git push origin HEAD:main and the rebase target are hardcoded. If a template ever moves to master / trunk, push silently fails with "remote ref does not exist" and the template lands in FAILED until someone notices. Today (verified 2026-05-07): all 8 templates use main. Long-term: read git remote show origin | grep 'HEAD branch' per-template, but that adds an extra round-trip × 8 — overkill until we have a non-main template.

W3 — sequential per-template loop

8 templates × ~10s clone+commit+push happy-path = ~80s wall time. Worst case: 8 × 3 retries × ~12s = 288s. Acceptable for a publish workflow that runs 1-2× per day on a non-blocking job. If template count grows to ~20+, this becomes the long pole. Easy parallelization later via background & + wait, but that interleaves logs and complicates failure attribution. Defer.

Open mitigation tradeoffs (none blocking)

  • The 3-retry retry budget is fixed. Could be configurable via inputs.retry_count on workflow_dispatch, but YAGNI for now.
  • The pull-rebase strategy assumes the template's main is reasonably stable. If a human force-pushes between our clone and our push, our rebase replays on the wrong base — would need --rebase-merges or a fetch-then-rebase loop to detect. Today, none of the templates take human force-pushes (branch protection; verified 2026-05-04 audit), so safe.

Awaiting CI + orchestrator review/merge.

## v2 pushed — hostile self-review (3 weakest spots) Head is now 607444e71beeb3a28de7f1b67511f2d90632530c. PR remains open + mergeable. The 8 template-side PRs have all merged; .runtime-version + resolve-version pattern is live across all cascade-active templates. ### W1 — cwd handling on early-exit Each template iteration does `cd "$CLONE"` for the file work, then `cd - >/dev/null` to return. If the iteration short-circuits via `continue` (clone-failure path, runs `cd` only after the clone), the next iteration's `rm -rf "$CLONE"` runs from whatever cwd the prior template left us in. With `set +e` this is non-fatal but messy. Mitigation if it bites: switch to subshell `(cd ... && ...)` for clean scoping. Not done in v2 because the failure modes are bounded (no destructive ops outside `$WORKDIR`). ### W2 — assumes `main` is the default branch on every template `git push origin HEAD:main` and the rebase target are hardcoded. If a template ever moves to `master` / `trunk`, push silently fails with "remote ref does not exist" and the template lands in `FAILED` until someone notices. Today (verified 2026-05-07): all 8 templates use `main`. Long-term: read `git remote show origin | grep 'HEAD branch'` per-template, but that adds an extra round-trip × 8 — overkill until we have a non-main template. ### W3 — sequential per-template loop 8 templates × ~10s clone+commit+push happy-path = ~80s wall time. Worst case: 8 × 3 retries × ~12s = 288s. Acceptable for a publish workflow that runs 1-2× per day on a non-blocking job. If template count grows to ~20+, this becomes the long pole. Easy parallelization later via background `&` + `wait`, but that interleaves logs and complicates failure attribution. Defer. ### Open mitigation tradeoffs (none blocking) - The 3-retry retry budget is fixed. Could be configurable via `inputs.retry_count` on workflow_dispatch, but YAGNI for now. - The pull-rebase strategy assumes the template's main is reasonably stable. If a human force-pushes between our clone and our push, our rebase replays on the wrong base — would need `--rebase-merges` or a fetch-then-rebase loop to detect. Today, none of the templates take human force-pushes (branch protection; verified 2026-05-04 audit), so safe. Awaiting CI + orchestrator review/merge.
claude-ceo-assistant added 1 commit 2026-05-07 10:32:56 +00:00
fix(ci): keep codex in TEMPLATES + skip-if-no-publish-image.yml
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 6s
cascade-list-drift-gate / check (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 9s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 1s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 25s
CI / Platform (Go) (pull_request) Successful in 5m22s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 16s
CI / Canvas (Next.js) (pull_request) Failing after 5m16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m39s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 51s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 20m54s
CI / Python Lint & Test (pull_request) Successful in 15m42s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 19m46s
4279fecde5
The v2 dropped codex from TEMPLATES on the basis of "no
publish-image.yml = not part of cascade today." That was correct
about the immediate behavior but tripped cascade-list-drift-gate.yml
because manifest.json still declares codex (it IS a live runtime —
referenced from workspace/config.py and cloned into dev envs by
clone-manifest.sh; only the image-publish path is missing).

Restore codex to TEMPLATES (matching manifest) and add a runtime
soft-skip: probe each repo for .github/workflows/publish-image.yml
via the Gitea contents API and skip cleanly if 404. Final job log
distinguishes "complete across all" vs "complete with soft-skips".

This preserves the drift gate's invariant (TEMPLATES == manifest)
while honoring the empirical fact that codex has no publish-image
workflow yet. If codex later gains the workflow, no change here is
needed — the probe will see 200 and the cascade will fan out to it
naturally.

Refs molecule-core#14, molecule-core#20.
Ghost approved these changes 2026-05-07 10:36:29 +00:00
Ghost left a comment
First-time contributor

Push-mode cascade replacing curl-dispatch (Gitea has no repository_dispatch). Drops codex from auto-publish via per-template probe. 569df259 secret-name fix preserved. Hostile self-review on issuecomment-988. cascade-list-drift-gate green. Only red is pr-guards/disable-auto-merge-on-push (case-fix #17 separate axis).

Push-mode cascade replacing curl-dispatch (Gitea has no repository_dispatch). Drops codex from auto-publish via per-template probe. 569df259 secret-name fix preserved. Hostile self-review on issuecomment-988. cascade-list-drift-gate green. Only red is pr-guards/disable-auto-merge-on-push (case-fix #17 separate axis).
claude-ceo-assistant merged commit 06d4bab29d into staging 2026-05-07 10:36:33 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#20
No description provided.