molecule-core/.claude/CLAUDE_LOOP_NOTES.md
Hongming Wang 239883920e feat(.claude): 5 gstack-inspired skills + cron upgrades
Research on garrytan/gstack surfaced 5 patterns worth importing into
our cron / agent setup. These are skills, not platform code — they
guide how the cron and our own subagents work, not what the platform
does at runtime.

## New skills

1. **cross-vendor-review** — adversarial second-model review for
   noteworthy PRs (auth, billing, data deletion, migrations). Catches
   the 15-30% of bugs single-model review misses. Inspired by
   gstack's /codex.

2. **careful-mode** — REFUSE/WARN/ALLOW lists for destructive
   commands. Refuses force-push to main, blocks merging draft PRs,
   prevents rm -rf outside scratch dirs. Inspired by gstack's
   /careful + /freeze.

3. **cron-learnings** — per-project JSONL of operational learnings
   appended at the end of every tick, replayed at the start of the
   next. Stops the cron from re-litigating decided issues.
   Inspired by gstack's /learn.

4. **cron-retro** — weekly retrospective auto-posted as a GitHub
   issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate
   failure trends, code-review severity over time. Inspired by
   gstack's /retro.

5. **llm-judge** — cheap LLM-as-judge eval to catch "agent shipped
   the wrong thing" — the failure mode unit tests miss. Plug into
   issue-pickup pipeline so worker-agent draft PRs get scored before
   being marked ready. Inspired by gstack's tier-3 test infra.

## Cron updates (session-only, c5074cd5 + 060d136c)

- Hourly triage cron now opens with careful-mode activation +
  cron-learnings replay (Step 0)
- code-review skill on every PR being considered for merge
  (Step 2 supplement A — already present, formalized)
- cross-vendor-review on noteworthy PRs (Step 2 supplement B — new)
- llm-judge on issue-pickup draft PRs before marking ready (Step 4)
- Status report now includes cross-vendor pass/fail and llm-judge
  scores (Step 5)
- End-of-tick cron-learnings append (Step 5)
- New weekly cron at Sun 23:07 invokes the cron-retro skill

## What we did NOT take from gstack

- Their browser fork — not our product
- The 23 named roles — we have agent role templates already
- Bun toolchain — adds yet another runtime to our stack
- /design-shotgun and design-tool variants — we're not a design tool
- /document-release — our update-docs skill already covers this

See PR description for full research notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:36:55 -07:00

2.4 KiB
Raw Blame History

Loop discipline — process notes

2026-04-14 — gstack-inspired cron upgrades

Five new skills added under .claude/skills/ (inspired by garrytan/gstack):

  • cross-vendor-review — second-model adversarial review for noteworthy PRs (auth, billing, data deletion, migrations). Catches the 1530% of bugs single-model review misses.
  • careful-mode — REFUSE/WARN/ALLOW lists for destructive commands. Active at the start of every cron tick. Refuses force-push to main, blocks merging draft PRs, prevents rm -rf outside scratch dirs.
  • cron-learnings — per-project JSONL of operational learnings. End each cron tick by appending 13 lines; start the next tick by replaying the last 20.
  • cron-retro — weekly retrospective auto-posted as a GitHub issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate failures, code-review severity trends.
  • llm-judge — cheap LLM-as-judge eval to catch "agent shipped the wrong thing" — the failure mode unit tests miss.

Two crons govern this:

  • Hourly triage (:17 past each hour) — Step 0 activates careful-mode + replays cron-learnings; Step 2 supplements run code-review and (for noteworthy PRs) cross-vendor-review; Step 4 issue-pickup runs llm-judge before marking ready; Step 5 appends cron-learnings.
  • Weekly retro (Sunday 23:07) — invokes cron-retro skill, posts a GitHub issue.

Both crons are session-only per the runtime; re-invoke in a new session if needed.

Rule: a "skipped" PR must have a comment explaining the skip

When the hourly maintenance loop skips a PR for any reason — CI red, conflicting, merge dirty, missing tests, design drift — the FIRST skip in a session must leave a PR comment with the specific blocker and the exact fix the author needs to apply. Subsequent skips of the same PR (SHA unchanged) can be silent.

The failure mode this rule prevents: silently skipping a PR for many hours under a vague reason ("blocked / no CI / conflicting") without ever telling the author what they need to do. The PR sits indefinitely because the author has no comment to act on.

Concrete check at the top of each loop:

  • For every "known-blocked" PR I'm about to silently skip, verify there is a bot/me comment on the PR newer than the PR's head SHA that names the specific blocker. If not, that PR isn't actually blocked on the author — it's blocked on me writing the comment.

Caught 2026-04-13 on PR #114 (skipped 6+ loops with no comment).