Research on garrytan/gstack surfaced 5 patterns worth importing into our cron / agent setup. These are skills, not platform code — they guide how the cron and our own subagents work, not what the platform does at runtime. ## New skills 1. **cross-vendor-review** — adversarial second-model review for noteworthy PRs (auth, billing, data deletion, migrations). Catches the 15-30% of bugs single-model review misses. Inspired by gstack's /codex. 2. **careful-mode** — REFUSE/WARN/ALLOW lists for destructive commands. Refuses force-push to main, blocks merging draft PRs, prevents rm -rf outside scratch dirs. Inspired by gstack's /careful + /freeze. 3. **cron-learnings** — per-project JSONL of operational learnings appended at the end of every tick, replayed at the start of the next. Stops the cron from re-litigating decided issues. Inspired by gstack's /learn. 4. **cron-retro** — weekly retrospective auto-posted as a GitHub issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate failure trends, code-review severity over time. Inspired by gstack's /retro. 5. **llm-judge** — cheap LLM-as-judge eval to catch "agent shipped the wrong thing" — the failure mode unit tests miss. Plug into issue-pickup pipeline so worker-agent draft PRs get scored before being marked ready. Inspired by gstack's tier-3 test infra. ## Cron updates (session-only, c5074cd5 + 060d136c) - Hourly triage cron now opens with careful-mode activation + cron-learnings replay (Step 0) - code-review skill on every PR being considered for merge (Step 2 supplement A — already present, formalized) - cross-vendor-review on noteworthy PRs (Step 2 supplement B — new) - llm-judge on issue-pickup draft PRs before marking ready (Step 4) - Status report now includes cross-vendor pass/fail and llm-judge scores (Step 5) - End-of-tick cron-learnings append (Step 5) - New weekly cron at Sun 23:07 invokes the cron-retro skill ## What we did NOT take from gstack - Their browser fork — not our product - The 23 named roles — we have agent role templates already - Bun toolchain — adds yet another runtime to our stack - /design-shotgun and design-tool variants — we're not a design tool - /document-release — our update-docs skill already covers this See PR description for full research notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
70 lines
3.2 KiB
Markdown
70 lines
3.2 KiB
Markdown
---
|
|
name: cron-retro
|
|
description: Weekly retrospective digest of cron activity — PRs merged, gates failed, issues picked, code-review findings by severity, time-to-merge, regression trend. Posts to a dedicated GitHub issue. Inspired by gstack's /retro.
|
|
---
|
|
|
|
# cron-retro
|
|
|
|
The cron runs hourly and ships a lot. Without a periodic summary, drift happens silently — Gate 4 starts failing more often, code-review noise climbs, time-to-merge balloons, and nobody notices for weeks.
|
|
|
|
## When to run
|
|
|
|
- Every Sunday at 23:00 local (`0 23 * * 0` cron expression)
|
|
- On-demand by the CEO
|
|
|
|
## What to compute (over the prior 7 days)
|
|
|
|
From `gh pr list --state merged --search "merged:>=YYYY-MM-DD"` and our local `cron-learnings.jsonl`:
|
|
|
|
1. **Merged PR count** — total + by category (auth/security, refactor, feat, fix, docs, infra)
|
|
2. **Issues closed** — count, with PR-link for each
|
|
3. **Time-to-merge distribution** — median, p90, max. Excluding docs PRs (they merge instantly).
|
|
4. **Gate failure breakdown** — which gates failed how often. Patterns?
|
|
5. **Code-review findings** — total 🔴 / 🟡 / 🔵 across all PRs. Trend vs prior week.
|
|
6. **Mechanical fixes pushed** — how often did the cron fix a gate failure on-branch?
|
|
7. **Skips by reason** — categorize: design-judgment, CI-down, scope-too-open, noteworthy-CEO-needed
|
|
8. **Code volume** — net LOC added/removed (Garry Tan publishes these in his retros — keep us honest)
|
|
9. **Test count delta** — Go + Python + Vitest + Jest from start to end of week
|
|
10. **New runtime / library / tool added or removed** — anything strategic
|
|
|
|
## Format
|
|
|
|
Post a new GitHub issue titled `Cron retro: 2026-04-14 → 2026-04-21 (week N)` with body:
|
|
|
|
```markdown
|
|
# Week summary
|
|
- Merged: X PRs (Y closed issues)
|
|
- Median TTM: 3h12m (excluding docs)
|
|
- Code-review findings: 0 🔴 / 4 🟡 / 18 🔵 (vs last week: 0 / 6 / 24)
|
|
- Mechanical fixes pushed: 5
|
|
- Skips: 2 design-judgment, 1 CI-down
|
|
|
|
# Trend signals
|
|
- ↑ Frontend test coverage (+12 vitest, +1 file)
|
|
- ↓ Time-to-merge for auth PRs (down from 8h median to 3h — likely
|
|
because Gate-4 doc-sync subagent now catches missing .env entries)
|
|
- ⚠ Gate 7 (Playwright) failed 3 times this week vs 0 last week —
|
|
probably the canvas dev-server stale-chunk issue. Action item.
|
|
|
|
# Code volume
|
|
- 12,847 lines added, 8,213 removed across 23 commits
|
|
|
|
# Notes
|
|
- Closed #6, #13, #17, #23 — 4 issues from the launch backlog
|
|
- 2 issues remain in the SaaS-launch Tier 1 list (multi-tenancy, Fly Machines)
|
|
- New skills added this week: cross-vendor-review, careful-mode, cron-learnings, cron-retro
|
|
|
|
# Action items for next week
|
|
- [ ] Investigate Gate 7 flakes (likely fix: persistent canvas dev daemon)
|
|
- [ ] Pick up issue #19 (workspace restart context)
|
|
- [ ] PR #58 needs CEO review (configurable tier limits — behavior change)
|
|
```
|
|
|
|
## Why this exists
|
|
|
|
What gets measured improves. gstack publishes weekly retros and credits them with knowing where to invest. We have no analog. This is the smallest viable analog: one issue per week, generated automatically, costs nothing to ignore, valuable when the metrics start drifting.
|
|
|
|
## Implementation note
|
|
|
|
This skill should be invoked from a separate cron job (not the hourly triage cron). Suggested cron expression: `7 23 * * 0` — Sunday 23:07 local.
|