Hongming Wang 239883920e feat(.claude): 5 gstack-inspired skills + cron upgrades

Research on garrytan/gstack surfaced 5 patterns worth importing into
our cron / agent setup. These are skills, not platform code — they
guide how the cron and our own subagents work, not what the platform
does at runtime.

## New skills

1. **cross-vendor-review** — adversarial second-model review for
   noteworthy PRs (auth, billing, data deletion, migrations). Catches
   the 15-30% of bugs single-model review misses. Inspired by
   gstack's /codex.

2. **careful-mode** — REFUSE/WARN/ALLOW lists for destructive
   commands. Refuses force-push to main, blocks merging draft PRs,
   prevents rm -rf outside scratch dirs. Inspired by gstack's
   /careful + /freeze.

3. **cron-learnings** — per-project JSONL of operational learnings
   appended at the end of every tick, replayed at the start of the
   next. Stops the cron from re-litigating decided issues.
   Inspired by gstack's /learn.

4. **cron-retro** — weekly retrospective auto-posted as a GitHub
   issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate
   failure trends, code-review severity over time. Inspired by
   gstack's /retro.

5. **llm-judge** — cheap LLM-as-judge eval to catch "agent shipped
   the wrong thing" — the failure mode unit tests miss. Plug into
   issue-pickup pipeline so worker-agent draft PRs get scored before
   being marked ready. Inspired by gstack's tier-3 test infra.

## Cron updates (session-only, c5074cd5 + 060d136c)

- Hourly triage cron now opens with careful-mode activation +
  cron-learnings replay (Step 0)
- code-review skill on every PR being considered for merge
  (Step 2 supplement A — already present, formalized)
- cross-vendor-review on noteworthy PRs (Step 2 supplement B — new)
- llm-judge on issue-pickup draft PRs before marking ready (Step 4)
- Status report now includes cross-vendor pass/fail and llm-judge
  scores (Step 5)
- End-of-tick cron-learnings append (Step 5)
- New weekly cron at Sun 23:07 invokes the cron-retro skill

## What we did NOT take from gstack

- Their browser fork — not our product
- The 23 named roles — we have agent role templates already
- Bun toolchain — adds yet another runtime to our stack
- /design-shotgun and design-tool variants — we're not a design tool
- /document-release — our update-docs skill already covers this

See PR description for full research notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-14 11:36:55 -07:00

2.8 KiB

Raw Blame History

name	description
cron-learnings	At the end of every cron tick, append 1-3 lines of operational learnings (what worked, what surprised, what should change next tick) to a per-project JSONL. Replay at start of next tick. Inspired by gstack's /learn skill.

cron-learnings

Each tick, the cron does a lot of work. Half the lessons are forgotten by the next tick. This skill is the compounding layer.

Storage

Per-project file at:

~/.claude/projects/<sanitized-project-path>/memory/cron-learnings.jsonl

For molecule-monorepo, that's:

~/.claude/projects/-Users-hongming-Documents-GitHub-molecule-monorepo/memory/cron-learnings.jsonl

One JSON object per line:

{"ts": "2026-04-14T05:17:00Z", "tick_id": "5939aa3f-001", "category": "gate-fail", "summary": "Gate 4 (security) flagged token!=secret in PR #28; requireInternalAPISecret needs subtle.ConstantTimeCompare", "next_action": "When reviewing auth-gate code, grep for `subtle.ConstantTimeCompare`. Flag plain == on tokens."}

Categories:

gate-fail — a verification gate caught something
mechanical-fix — fixed a gate failure on-branch
false-positive — a code-review finding turned out to be wrong; record so we don't keep flagging it
tool-error — an MCP tool / CLI flaked; note the workaround
repo-state — something about the repo's state that next tick should know
pattern — a cross-PR pattern worth remembering (e.g., "every cron loop adds itself as noreply@anthropic.com; reviewers OK with it")

When to write

End of every cron tick (Step 5 of the cron prompt). 1-3 lines max — be terse.

When to read

Start of every cron tick. Read the last 20 lines (most recent first) before Step 1. Use them to:

Skip false-positive paths the previous tick flagged
Apply learned patterns (e.g., "PR #28 found INTERNAL_API_SECRET missing from .env.example — when reviewing future security PRs, always check .env.example sync as a first move")
Avoid re-litigating decided design choices

Pruning

Cap at 500 lines. When exceeded, the next write also drops the oldest 100 lines. The point is recent operational memory, not an audit log.

Format discipline

One line per event
ASCII-only for grep-friendliness
No PII, no tokens, no URLs with auth
summary is what HAPPENED; next_action is what FUTURE-YOU should DO
If you can't think of a concrete next_action, it's not worth logging

Why this exists

gstack's /learn showed that AI sessions repeatedly make the same mistakes because the lessons live only in the conversation that produced them. Writing them to disk lets every tick start with the accumulated wisdom of every prior tick, at zero cost. The awareness MCP we have is fine for cross-session human/agent memory — this file is specifically for the cron's own automation.

2.8 KiB Raw Blame History