Research on garrytan/gstack surfaced 5 patterns worth importing into our cron / agent setup. These are skills, not platform code — they guide how the cron and our own subagents work, not what the platform does at runtime. ## New skills 1. **cross-vendor-review** — adversarial second-model review for noteworthy PRs (auth, billing, data deletion, migrations). Catches the 15-30% of bugs single-model review misses. Inspired by gstack's /codex. 2. **careful-mode** — REFUSE/WARN/ALLOW lists for destructive commands. Refuses force-push to main, blocks merging draft PRs, prevents rm -rf outside scratch dirs. Inspired by gstack's /careful + /freeze. 3. **cron-learnings** — per-project JSONL of operational learnings appended at the end of every tick, replayed at the start of the next. Stops the cron from re-litigating decided issues. Inspired by gstack's /learn. 4. **cron-retro** — weekly retrospective auto-posted as a GitHub issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate failure trends, code-review severity over time. Inspired by gstack's /retro. 5. **llm-judge** — cheap LLM-as-judge eval to catch "agent shipped the wrong thing" — the failure mode unit tests miss. Plug into issue-pickup pipeline so worker-agent draft PRs get scored before being marked ready. Inspired by gstack's tier-3 test infra. ## Cron updates (session-only, c5074cd5 + 060d136c) - Hourly triage cron now opens with careful-mode activation + cron-learnings replay (Step 0) - code-review skill on every PR being considered for merge (Step 2 supplement A — already present, formalized) - cross-vendor-review on noteworthy PRs (Step 2 supplement B — new) - llm-judge on issue-pickup draft PRs before marking ready (Step 4) - Status report now includes cross-vendor pass/fail and llm-judge scores (Step 5) - End-of-tick cron-learnings append (Step 5) - New weekly cron at Sun 23:07 invokes the cron-retro skill ## What we did NOT take from gstack - Their browser fork — not our product - The 23 named roles — we have agent role templates already - Bun toolchain — adds yet another runtime to our stack - /design-shotgun and design-tool variants — we're not a design tool - /document-release — our update-docs skill already covers this See PR description for full research notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3.2 KiB
| name | description |
|---|---|
| cron-retro | Weekly retrospective digest of cron activity — PRs merged, gates failed, issues picked, code-review findings by severity, time-to-merge, regression trend. Posts to a dedicated GitHub issue. Inspired by gstack's /retro. |
cron-retro
The cron runs hourly and ships a lot. Without a periodic summary, drift happens silently — Gate 4 starts failing more often, code-review noise climbs, time-to-merge balloons, and nobody notices for weeks.
When to run
- Every Sunday at 23:00 local (
0 23 * * 0cron expression) - On-demand by the CEO
What to compute (over the prior 7 days)
From gh pr list --state merged --search "merged:>=YYYY-MM-DD" and our local cron-learnings.jsonl:
- Merged PR count — total + by category (auth/security, refactor, feat, fix, docs, infra)
- Issues closed — count, with PR-link for each
- Time-to-merge distribution — median, p90, max. Excluding docs PRs (they merge instantly).
- Gate failure breakdown — which gates failed how often. Patterns?
- Code-review findings — total 🔴 / 🟡 / 🔵 across all PRs. Trend vs prior week.
- Mechanical fixes pushed — how often did the cron fix a gate failure on-branch?
- Skips by reason — categorize: design-judgment, CI-down, scope-too-open, noteworthy-CEO-needed
- Code volume — net LOC added/removed (Garry Tan publishes these in his retros — keep us honest)
- Test count delta — Go + Python + Vitest + Jest from start to end of week
- New runtime / library / tool added or removed — anything strategic
Format
Post a new GitHub issue titled Cron retro: 2026-04-14 → 2026-04-21 (week N) with body:
# Week summary
- Merged: X PRs (Y closed issues)
- Median TTM: 3h12m (excluding docs)
- Code-review findings: 0 🔴 / 4 🟡 / 18 🔵 (vs last week: 0 / 6 / 24)
- Mechanical fixes pushed: 5
- Skips: 2 design-judgment, 1 CI-down
# Trend signals
- ↑ Frontend test coverage (+12 vitest, +1 file)
- ↓ Time-to-merge for auth PRs (down from 8h median to 3h — likely
because Gate-4 doc-sync subagent now catches missing .env entries)
- ⚠ Gate 7 (Playwright) failed 3 times this week vs 0 last week —
probably the canvas dev-server stale-chunk issue. Action item.
# Code volume
- 12,847 lines added, 8,213 removed across 23 commits
# Notes
- Closed #6, #13, #17, #23 — 4 issues from the launch backlog
- 2 issues remain in the SaaS-launch Tier 1 list (multi-tenancy, Fly Machines)
- New skills added this week: cross-vendor-review, careful-mode, cron-learnings, cron-retro
# Action items for next week
- [ ] Investigate Gate 7 flakes (likely fix: persistent canvas dev daemon)
- [ ] Pick up issue #19 (workspace restart context)
- [ ] PR #58 needs CEO review (configurable tier limits — behavior change)
Why this exists
What gets measured improves. gstack publishes weekly retros and credits them with knowing where to invest. We have no analog. This is the smallest viable analog: one issue per week, generated automatically, costs nothing to ignore, valuable when the metrics start drifting.
Implementation note
This skill should be invoked from a separate cron job (not the hourly triage cron). Suggested cron expression: 7 23 * * 0 — Sunday 23:07 local.