molecule-core/.claude/skills/cron-retro/SKILL.md
Hongming Wang 239883920e feat(.claude): 5 gstack-inspired skills + cron upgrades
Research on garrytan/gstack surfaced 5 patterns worth importing into
our cron / agent setup. These are skills, not platform code — they
guide how the cron and our own subagents work, not what the platform
does at runtime.

## New skills

1. **cross-vendor-review** — adversarial second-model review for
   noteworthy PRs (auth, billing, data deletion, migrations). Catches
   the 15-30% of bugs single-model review misses. Inspired by
   gstack's /codex.

2. **careful-mode** — REFUSE/WARN/ALLOW lists for destructive
   commands. Refuses force-push to main, blocks merging draft PRs,
   prevents rm -rf outside scratch dirs. Inspired by gstack's
   /careful + /freeze.

3. **cron-learnings** — per-project JSONL of operational learnings
   appended at the end of every tick, replayed at the start of the
   next. Stops the cron from re-litigating decided issues.
   Inspired by gstack's /learn.

4. **cron-retro** — weekly retrospective auto-posted as a GitHub
   issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate
   failure trends, code-review severity over time. Inspired by
   gstack's /retro.

5. **llm-judge** — cheap LLM-as-judge eval to catch "agent shipped
   the wrong thing" — the failure mode unit tests miss. Plug into
   issue-pickup pipeline so worker-agent draft PRs get scored before
   being marked ready. Inspired by gstack's tier-3 test infra.

## Cron updates (session-only, c5074cd5 + 060d136c)

- Hourly triage cron now opens with careful-mode activation +
  cron-learnings replay (Step 0)
- code-review skill on every PR being considered for merge
  (Step 2 supplement A — already present, formalized)
- cross-vendor-review on noteworthy PRs (Step 2 supplement B — new)
- llm-judge on issue-pickup draft PRs before marking ready (Step 4)
- Status report now includes cross-vendor pass/fail and llm-judge
  scores (Step 5)
- End-of-tick cron-learnings append (Step 5)
- New weekly cron at Sun 23:07 invokes the cron-retro skill

## What we did NOT take from gstack

- Their browser fork — not our product
- The 23 named roles — we have agent role templates already
- Bun toolchain — adds yet another runtime to our stack
- /design-shotgun and design-tool variants — we're not a design tool
- /document-release — our update-docs skill already covers this

See PR description for full research notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:36:55 -07:00

70 lines
3.2 KiB
Markdown

---
name: cron-retro
description: Weekly retrospective digest of cron activity — PRs merged, gates failed, issues picked, code-review findings by severity, time-to-merge, regression trend. Posts to a dedicated GitHub issue. Inspired by gstack's /retro.
---
# cron-retro
The cron runs hourly and ships a lot. Without a periodic summary, drift happens silently — Gate 4 starts failing more often, code-review noise climbs, time-to-merge balloons, and nobody notices for weeks.
## When to run
- Every Sunday at 23:00 local (`0 23 * * 0` cron expression)
- On-demand by the CEO
## What to compute (over the prior 7 days)
From `gh pr list --state merged --search "merged:>=YYYY-MM-DD"` and our local `cron-learnings.jsonl`:
1. **Merged PR count** — total + by category (auth/security, refactor, feat, fix, docs, infra)
2. **Issues closed** — count, with PR-link for each
3. **Time-to-merge distribution** — median, p90, max. Excluding docs PRs (they merge instantly).
4. **Gate failure breakdown** — which gates failed how often. Patterns?
5. **Code-review findings** — total 🔴 / 🟡 / 🔵 across all PRs. Trend vs prior week.
6. **Mechanical fixes pushed** — how often did the cron fix a gate failure on-branch?
7. **Skips by reason** — categorize: design-judgment, CI-down, scope-too-open, noteworthy-CEO-needed
8. **Code volume** — net LOC added/removed (Garry Tan publishes these in his retros — keep us honest)
9. **Test count delta** — Go + Python + Vitest + Jest from start to end of week
10. **New runtime / library / tool added or removed** — anything strategic
## Format
Post a new GitHub issue titled `Cron retro: 2026-04-14 → 2026-04-21 (week N)` with body:
```markdown
# Week summary
- Merged: X PRs (Y closed issues)
- Median TTM: 3h12m (excluding docs)
- Code-review findings: 0 🔴 / 4 🟡 / 18 🔵 (vs last week: 0 / 6 / 24)
- Mechanical fixes pushed: 5
- Skips: 2 design-judgment, 1 CI-down
# Trend signals
- ↑ Frontend test coverage (+12 vitest, +1 file)
- ↓ Time-to-merge for auth PRs (down from 8h median to 3h — likely
because Gate-4 doc-sync subagent now catches missing .env entries)
- ⚠ Gate 7 (Playwright) failed 3 times this week vs 0 last week —
probably the canvas dev-server stale-chunk issue. Action item.
# Code volume
- 12,847 lines added, 8,213 removed across 23 commits
# Notes
- Closed #6, #13, #17, #23 — 4 issues from the launch backlog
- 2 issues remain in the SaaS-launch Tier 1 list (multi-tenancy, Fly Machines)
- New skills added this week: cross-vendor-review, careful-mode, cron-learnings, cron-retro
# Action items for next week
- [ ] Investigate Gate 7 flakes (likely fix: persistent canvas dev daemon)
- [ ] Pick up issue #19 (workspace restart context)
- [ ] PR #58 needs CEO review (configurable tier limits — behavior change)
```
## Why this exists
What gets measured improves. gstack publishes weekly retros and credits them with knowing where to invest. We have no analog. This is the smallest viable analog: one issue per week, generated automatically, costs nothing to ignore, valuable when the metrics start drifting.
## Implementation note
This skill should be invoked from a separate cron job (not the hourly triage cron). Suggested cron expression: `7 23 * * 0` — Sunday 23:07 local.