feat(plugins): bundle hermes-achievements + scan full session history (#17754)

* feat(plugins): bundle hermes-achievements, scan full session history

Ships @PCinkusz's hermes-achievements dashboard plugin (https://github.com/PCinkusz/hermes-achievements) as a bundled plugin at plugins/hermes-achievements/ and fixes a bug in the scan path that made the plugin only see the first 200 sessions — making lifetime badges (50k tool calls, 75k errors, etc.) unreachable on long-running installs.

Changes:

- plugins/hermes-achievements/: vendor v0.3.1 verbatim (manifest, dist/, plugin_api.py, tests, docs, README).
- plugins/hermes-achievements/dashboard/plugin_api.py:
  * scan_sessions(): limit=None now scans ALL sessions via SQLite LIMIT -1. Previously capped at 200, so users with 8000+ sessions saw ~2% of their history.
  * evaluate_all(): first-ever scans run in a background thread so the dashboard request path never blocks. Stale snapshots serve immediately while a background refresh runs. force=True still blocks synchronously for manual /rescan.
  * _build_pending_snapshot(), _start_background_scan(), _run_scan_and_update_cache(): supporting plumbing + idempotent thread spawn.
- tests/plugins/test_achievements_plugin.py: new tests covering the 200-cap regression, the background-scan first-run flow, stale-serve-plus-background-refresh, forced sync rescan, and scan-thread idempotency.
- website/docs/user-guide/features/built-in-plugins.md: lists hermes-achievements in the bundled-plugins table and documents API endpoints, state files, and performance characteristics.

E2E validated against a real 8564-session ~6.4GB state.db:
  * Cold scan: 13m 19s (one-time, backgrounded — UI never blocks)
  * Warm rescan: 1.47s (8563/8564 sessions reused from checkpoint cache)
  * 57/60 achievements unlocked, 3 discovered — aggregates like total_tool_calls=259958, total_errors=164213, skill_events=368243 correctly surface lifetime badges that the 200-cap made unreachable.

Original credit: @PCinkusz (MIT-licensed). Upstream repo remains the staging ground for new badges; this bundle keeps the dashboard feature parity with Hermes core changes.

* feat(achievements): publish partial snapshots during cold scan

Previously a cold scan on a large session DB (13min on 8564 sessions)
showed zero badges for the entire duration, then every badge at once
when the scan completed. A dashboard refresh mid-scan was indistinguishable
from a fresh install with no history.

Now the scanner publishes a partial snapshot to _SNAPSHOT_CACHE every
250 sessions, so each refresh during a cold scan surfaces more badges
incrementally.

Mechanism:
- scan_sessions() takes an optional progress_callback fired every
  progress_every sessions with (sessions_so_far, scanned, total).
- _compute_from_scan() is extracted from compute_all() and gains an
  is_partial flag that skips writing to state.json — we don't want
  to record unlocked_at based on a half-complete aggregate that a
  later session might rebalance.
- _run_scan_and_update_cache() installs a publisher callback that
  builds a partial snapshot, marks it mode='in_progress', and writes
  it to the cache with age=0 so the UI keeps polling /scan-status
  and picks up the final snapshot when the scan completes.
- Manual /rescan (force=True) disables partial publishing — the
  caller is blocking on the final result anyway.

E2E against real 8564-session state.db (polled cache every 10s):
  t=10s: cache empty
  t=20s: 250/8564 scanned, 35 unlocked, 25 discovered
  t=40s: 500/8564 scanned, 42 unlocked, 18 discovered
  t=60s: 1000/8564 scanned, 49 unlocked, 11 discovered
  ...

Tests: 9/9 pass (2 new — partial snapshot publication + no-persist-on-partial).
Upstream unittest suite: 10/10 pass.

* feat(achievements): in-progress scan banner with live % progress

Previously the dashboard showed zero badges silently during long cold
scans (13min on 8564 sessions). The backend was publishing partial
snapshots every 250 sessions, but the bundled UI didn't surface any
indicator that a scan was running — it just rendered the main page
with whatever counts were currently published and no way for the user
to know more progress was coming.

UI changes (dist/index.js, dist/style.css):

- Added a scan-in-progress banner rendered between the hero and stats
  when scan_meta.mode is 'pending' or 'in_progress'. Shows:
    BUILDING ACHIEVEMENT PROFILE…
    Scanned 1,750 of 8,564 sessions · 20%. Badges unlock as more history streams in.
  with a pulsing teal indicator and a filling teal/cyan progress bar.
  Disappears the moment the backend flips to 'full' or 'incremental'.

- Added an auto-poller via useEffect — while scanInFlight is true the
  page re-fetches /achievements every 4s WITHOUT toggling the loading
  skeleton, so unlock counts tick up visibly without the user refreshing.
  The effect cleans itself up when the scan finishes.

- Added refresh() (re-fetch, no loading flip) alongside the existing
  load() (full reload, used by the Rescan button).

Attribution preserved:

- Added a header comment to index.js crediting @PCinkusz
  (https://github.com/PCinkusz/hermes-achievements, MIT) as the
  original author, noting the banner is a layered addition on top
  of the original dist bundle.
- Matching header comment in style.css, flagging the new
  .ha-scan-banner* rules as the local addition.

Live-verified end to end:

- Spun up `hermes dashboard --port 9229 --no-open` against a fresh
  HERMES_HOME symlinked to the real 8564-session state.db.
- Opened /achievements in a browser, confirmed the banner renders with
  live progress: 'Scanned 1,000 of 8,564 sessions · 11%' → updates to
  '1,250 ... · 14%' → '1,750 ... · 20%' without user interaction,
  matching the backend's partial publications.
- Stats row simultaneously climbed from 35 → 49 → 53 unlocked as
  more history streamed in.
- Vision analysis of the rendered page confirms the banner styling
  matches the rest of the dashboard (dark card bg, teal accent, same
  small-caps typography, pulsing indicator reusing ha-pulse keyframes).
This commit is contained in:
Teknium 2026-04-29 23:23:57 -07:00 committed by GitHub
parent ce0c3ae493
commit 62a5d7207d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
14 changed files with 2828 additions and 0 deletions

View File

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Hermes Achievements contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -0,0 +1,148 @@
# Hermes Achievements
> **Bundled with Hermes Agent.** Originally authored by [@PCinkusz](https://github.com/PCinkusz) at https://github.com/PCinkusz/hermes-achievements — vendored into `plugins/hermes-achievements/` so it ships with the dashboard out-of-the-box and stays in lockstep with Hermes feature changes. Upstream repo remains the staging ground for new badges and UI iteration.
>
> When Hermes is installed via `pip install hermes-agent` or cloned from source, this plugin auto-registers as a dashboard tab on first `hermes dashboard` launch. No separate install step. See [Built-in Plugins → hermes-achievements](../../website/docs/user-guide/features/built-in-plugins.md) in the main docs.
Achievement system for the Hermes Dashboard: collectible, tiered badges generated from real local Hermes session history.
![Hermes Achievements dashboard](docs/assets/achievements-dashboard-hd.png)
The screenshots use temporary demo tier data to show the full visual range. The plugin itself reads real local Hermes session history by default.
> **Update notice (2026-04-29):** If you installed this plugin before today, update to the latest version. The achievements scan path was refactored for much faster warm loads (snapshot cache + incremental checkpoint scan).
## What it does
Hermes Achievements scans local Hermes sessions and unlocks badges based on real agent behavior:
- autonomous tool chains
- debugging and recovery patterns
- vibe-coding file edits
- Hermes-native skills, memory, cron, and plugin usage
- web research and browser automation
- model/provider workflows
- lifestyle patterns such as weekend or night sessions
Achievements have three visible states:
- **Unlocked** — earned at least one tier
- **Discovered** — known achievement, progress visible, not earned yet
- **Secret** — hidden until Hermes detects the first related signal
Most achievements level through:
```text
Copper → Silver → Gold → Diamond → Olympian
```
Each card has a collapsible **What counts** section showing the exact tracked metric or requirement once the user wants details.
Version `0.2.x` expands the catalog to 60+ achievements, including model/provider badges such as **Five-Model Flight**, **Provider Polyglot**, **Claude Confidant**, **Gemini Cartographer**, and **Open Weights Pilgrim**.
## Examples
- Let Him Cook
- Toolchain Maxxer
- Red Text Connoisseur
- Port 3000 Is Taken
- This Was Supposed To Be Quick
- One More Small Change
- Skillsmith
- Memory Keeper
- Context Dragon
- Plugin Goblin
- Rabbit Hole Certified
## Install
Clone into your Hermes plugins directory:
```bash
git clone https://github.com/PCinkusz/hermes-achievements ~/.hermes/plugins/hermes-achievements
```
For local development, keep the repo elsewhere and symlink it:
```bash
git clone https://github.com/PCinkusz/hermes-achievements ~/hermes-achievements
ln -s ~/hermes-achievements ~/.hermes/plugins/hermes-achievements
```
Then rescan dashboard plugins:
```bash
curl http://127.0.0.1:9119/api/dashboard/plugins/rescan
```
If backend API routes 404, restart `hermes dashboard`; plugin APIs are mounted at dashboard startup.
## Updating
If you installed with git:
```bash
cd ~/.hermes/plugins/hermes-achievements
git pull --ff-only
curl http://127.0.0.1:9119/api/dashboard/plugins/rescan
```
If the update changes backend routes or `plugin_api.py`, restart `hermes dashboard` after pulling.
As of 2026-04-29, updating is strongly recommended because scan performance changed significantly:
- removed duplicate `/overview` scan path
- added cached `/achievements` snapshot
- added incremental checkpoint reuse for unchanged sessions
Achievement unlock state is stored locally in `state.json` and is not overwritten by git updates. New achievements are evaluated from your existing Hermes session history. Achievement IDs are stable and should not be renamed casually because they are the unlock-state keys.
Releases are tagged in git, for example:
```bash
git fetch --tags
git checkout v0.2.0
```
## Files
```text
dashboard/
├── manifest.json
├── plugin_api.py
└── dist/
├── index.js
└── style.css
```
## API
Routes are mounted under:
```text
/api/plugins/hermes-achievements/
```
Endpoints:
```text
GET /achievements
GET /scan-status
GET /recent-unlocks
GET /sessions/{session_id}/badges
POST /rescan
POST /reset-state
```
## Development
Run checks:
```bash
node --check dashboard/dist/index.js
python3 -m py_compile dashboard/plugin_api.py
python3 -m unittest tests/test_achievement_engine.py -v
```
## License
MIT

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,120 @@
/* hermes-achievements dashboard styles
* Originally authored by @PCinkusz https://github.com/PCinkusz/hermes-achievements (MIT).
* Bundled into hermes-agent. The in-progress scan banner rules at the bottom
* (.ha-scan-banner*) are a small addition layered on top of the original bundle.
*/
.ha-page { display: flex; flex-direction: column; gap: 1rem; }
.ha-hero { position: relative; overflow: hidden; display: flex; align-items: flex-end; justify-content: space-between; gap: 1rem; border: 1px solid var(--color-border); background: radial-gradient(circle at 12% 0, rgba(103,232,249,.13), transparent 30%), linear-gradient(135deg, color-mix(in srgb, var(--color-card) 88%, transparent), color-mix(in srgb, var(--color-primary) 10%, transparent)); padding: 1.25rem; }
.ha-hero:before { content: ""; position: absolute; inset: auto -10% -80% -10%; height: 180%; pointer-events: none; background: radial-gradient(circle, rgba(242,201,76,.12), transparent 55%); }
.ha-hero h1 { position: relative; margin: 0; font-size: clamp(2rem, 4vw, 4.2rem); line-height: .9; letter-spacing: -0.06em; }
.ha-hero p { position: relative; max-width: 52rem; margin: .65rem 0 0; color: var(--color-muted-foreground); }
.ha-kicker { position: relative; color: var(--color-muted-foreground); text-transform: uppercase; letter-spacing: .18em; font-size: .72rem; font-family: var(--font-mono, ui-monospace, monospace); }
.ha-refresh { position: relative; white-space: nowrap; }
.ha-stats { display: grid; grid-template-columns: repeat(5, minmax(0, 1fr)); gap: .75rem; }
.ha-stat-content { padding: 1rem !important; }
.ha-stat-label { color: var(--color-muted-foreground); font-size: .75rem; text-transform: uppercase; letter-spacing: .12em; }
.ha-stat-value { margin-top: .35rem; font-size: 1.4rem; font-weight: 750; letter-spacing: -0.035em; }
.ha-stat-hint { margin-top: .2rem; color: var(--color-muted-foreground); font-size: .75rem; }
.ha-toolbar { display: flex; justify-content: space-between; gap: .75rem; align-items: center; flex-wrap: wrap; }
.ha-pills { display: flex; gap: .35rem; flex-wrap: wrap; }
.ha-pills button { border: 1px solid var(--color-border); background: color-mix(in srgb, var(--color-card) 72%, transparent); color: var(--color-muted-foreground); padding: .35rem .6rem; font-size: .78rem; cursor: pointer; }
.ha-pills button.active, .ha-pills button:hover { color: var(--color-foreground); border-color: var(--ha-tier, var(--color-ring)); background: color-mix(in srgb, var(--color-primary) 16%, var(--color-card)); }
.ha-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(320px, 1fr)); gap: .9rem; }
.ha-card { --ha-tier: var(--color-border); position: relative; overflow: hidden; min-height: 214px; border: 1px solid color-mix(in srgb, var(--ha-tier) 46%, var(--color-border)); background: radial-gradient(circle at 2.6rem 2.2rem, color-mix(in srgb, var(--ha-tier) 16%, transparent), transparent 34%), linear-gradient(180deg, rgba(255,255,255,.04), transparent), color-mix(in srgb, var(--color-card) 92%, #000); transition: transform .16s ease, border-color .16s ease, opacity .16s ease, box-shadow .16s ease; }
.ha-card:hover { transform: translateY(-2px); border-color: var(--ha-tier); box-shadow: 0 0 0 1px color-mix(in srgb, var(--ha-tier) 16%, transparent); }
.ha-card-content { position: relative; z-index: 1; padding: 1rem !important; display: flex; flex-direction: column; gap: .75rem; height: 100%; }
.ha-card-head { display: grid; grid-template-columns: 3.1rem minmax(0, 1fr) auto; gap: .85rem; align-items: start; }
.ha-icon { display: grid; place-items: center; width: 2.9rem; height: 2.9rem; color: var(--ha-tier); }
.ha-lucide { width: 1.78rem; height: 1.78rem; stroke: currentColor; stroke-width: 2.15; filter: drop-shadow(0 0 8px color-mix(in srgb, var(--ha-tier) 24%, transparent)); }
.ha-card-title { font-weight: 780; line-height: 1.05; letter-spacing: -0.025em; }
.ha-card-category { margin-top: .28rem; color: var(--color-muted-foreground); font-size: .76rem; }
.ha-badges { display: flex; flex-direction: column; align-items: flex-end; gap: .25rem; }
.ha-tier-badge, .ha-state-badge { border: 1px solid var(--ha-tier); color: var(--ha-tier); background: color-mix(in srgb, var(--ha-tier) 10%, transparent); padding: .16rem .38rem; font-size: .67rem; text-transform: uppercase; letter-spacing: .08em; font-family: var(--font-mono, ui-monospace, monospace); }
.ha-description { margin: 0; color: var(--color-muted-foreground); font-size: .86rem; line-height: 1.45; min-height: 2.4em; }
.ha-criteria { border: 1px solid color-mix(in srgb, var(--ha-tier) 28%, var(--color-border)); background: color-mix(in srgb, var(--ha-tier) 5%, transparent); }
.ha-criteria summary { cursor: pointer; padding: .5rem .65rem; color: var(--ha-tier); text-transform: uppercase; letter-spacing: .1em; font-size: .66rem; font-family: var(--font-mono, ui-monospace, monospace); user-select: none; }
.ha-criteria summary:hover { background: color-mix(in srgb, var(--ha-tier) 8%, transparent); }
.ha-criteria p { margin: 0; border-top: 1px solid color-mix(in srgb, var(--ha-tier) 18%, var(--color-border)); padding: .55rem .65rem .65rem; color: color-mix(in srgb, var(--color-foreground) 78%, var(--color-muted-foreground)); font-size: .76rem; line-height: 1.38; }
.ha-progress-row { display: flex; align-items: center; gap: .55rem; margin-top: 0; }
.ha-progress-track { flex: 1; height: .48rem; border: 1px solid color-mix(in srgb, var(--ha-tier) 34%, var(--color-border)); background: rgba(0,0,0,.22); overflow: hidden; }
.ha-progress-fill { height: 100%; background: linear-gradient(90deg, var(--ha-tier), color-mix(in srgb, var(--ha-tier) 48%, white)); }
.ha-progress-text { min-width: 5.4rem; text-align: right; font-family: var(--font-mono, ui-monospace, monospace); color: var(--color-muted-foreground); font-size: .72rem; }
.ha-evidence-slot { min-height: 1.65rem; margin-top: auto; display: flex; align-items: flex-end; }
.ha-evidence { width: 100%; display: flex; align-items: center; gap: .4rem; color: var(--color-muted-foreground); font-size: .72rem; min-width: 0; }
.ha-evidence-label { text-transform: uppercase; letter-spacing: .09em; font-family: var(--font-mono, ui-monospace, monospace); flex: 0 0 auto; }
.ha-evidence-title { min-width: 0; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; color: color-mix(in srgb, var(--color-foreground) 84%, var(--color-muted-foreground)); }
.ha-evidence-empty { visibility: hidden; }
.ha-latest h2 { margin: 0 0 .5rem; font-size: 1rem; }
.ha-latest-row { display: flex; gap: .5rem; flex-wrap: wrap; }
.ha-chip { display: inline-flex; align-items: center; gap: .35rem; border: 1px solid var(--ha-tier); color: var(--ha-tier); background: color-mix(in srgb, var(--ha-tier) 10%, transparent); padding: .35rem .55rem; font-size: .8rem; }
.ha-chip-icon .ha-lucide { width: .95rem; height: .95rem; }
.ha-slot { border-style: dashed; }
.ha-slot-content { display: flex; gap: .6rem; align-items: center; padding: .65rem .8rem !important; font-size: .82rem; }
.ha-slot-star { color: #67e8f9; }
.ha-slot-muted { color: var(--color-muted-foreground); margin-left: auto; }
.ha-error { border-color: #ef4444; color: #fecaca; }
.ha-loading { color: var(--color-muted-foreground); font-family: var(--font-mono, ui-monospace, monospace); padding: 2rem; border: 1px dashed var(--color-border); }
.ha-guide { display: grid; grid-template-columns: minmax(0, 1.15fr) minmax(0, .85fr); gap: .75rem; }
.ha-guide > div { border: 1px solid var(--color-border); background: color-mix(in srgb, var(--color-card) 82%, transparent); padding: .85rem 1rem; }
.ha-guide strong { display: block; margin-bottom: .45rem; font-size: .78rem; text-transform: uppercase; letter-spacing: .12em; font-family: var(--font-mono, ui-monospace, monospace); }
.ha-guide p { margin: 0; color: var(--color-muted-foreground); font-size: .84rem; line-height: 1.45; }
.ha-tier-legend { display: flex; align-items: center; gap: .45rem; flex-wrap: wrap; }
.ha-tier-step { --ha-tier: var(--color-border); display: inline-flex; align-items: center; gap: .32rem; color: var(--ha-tier); border: 1px solid color-mix(in srgb, var(--ha-tier) 52%, var(--color-border)); background: color-mix(in srgb, var(--ha-tier) 8%, transparent); padding: .28rem .45rem; font-size: .72rem; font-family: var(--font-mono, ui-monospace, monospace); text-transform: uppercase; letter-spacing: .06em; }
.ha-tier-step i { width: .55rem; height: .55rem; background: var(--ha-tier); display: inline-block; }
.ha-tier-arrow { color: var(--color-muted-foreground); }
.ha-state-discovered { opacity: .92; }
.ha-state-discovered .ha-card-title { color: color-mix(in srgb, var(--color-foreground) 82%, var(--ha-tier)); }
.ha-state-secret { opacity: .5; filter: grayscale(.55); }
.ha-state-secret:after { content: ""; position: absolute; inset: 0; pointer-events: none; background: repeating-linear-gradient(-45deg, transparent 0 8px, rgba(255,255,255,.035) 8px 10px); }
.ha-tier-pending { --ha-tier: color-mix(in srgb, var(--color-muted-foreground) 64%, transparent); }
.ha-tier-copper { --ha-tier: #b87333; }
.ha-tier-silver { --ha-tier: #c0c7d2; }
.ha-tier-gold { --ha-tier: #f2c94c; box-shadow: 0 0 22px rgba(242,201,76,.08); }
.ha-tier-diamond { --ha-tier: #67e8f9; box-shadow: 0 0 24px rgba(103,232,249,.1); }
.ha-tier-olympian { --ha-tier: #c084fc; box-shadow: 0 0 34px rgba(192,132,252,.18), 0 0 12px rgba(242,201,76,.1); }
@media (max-width: 980px) { .ha-stats { grid-template-columns: repeat(2, minmax(0, 1fr)); } .ha-guide { grid-template-columns: 1fr; } }
@media (max-width: 800px) { .ha-stats { grid-template-columns: 1fr; } .ha-hero { flex-direction: column; align-items: stretch; } .ha-card-head { grid-template-columns: 3.1rem 1fr; } .ha-badges { grid-column: 1 / -1; align-items: flex-start; flex-direction: row; } }
.ha-secret-empty-content { padding: 1rem !important; }
.ha-secret-empty strong { display: block; margin-bottom: .35rem; }
.ha-secret-empty p { margin: 0; color: var(--color-muted-foreground); font-size: .86rem; line-height: 1.45; }
.ha-page-loading { animation: ha-fade-in .18s ease-out; }
.ha-loading-hero { align-items: center; }
.ha-scan-status { position: relative; z-index: 1; display: flex; align-items: center; gap: .8rem; min-width: 18rem; border: 1px solid color-mix(in srgb, #67e8f9 35%, var(--color-border)); background: color-mix(in srgb, var(--color-card) 78%, transparent); padding: .8rem .95rem; color: var(--color-foreground); }
.ha-scan-status strong { display: block; font-size: .82rem; text-transform: uppercase; letter-spacing: .1em; font-family: var(--font-mono, ui-monospace, monospace); }
.ha-scan-status p { margin: .25rem 0 0; font-size: .78rem; line-height: 1.35; color: var(--color-muted-foreground); }
.ha-scan-pulse { width: .72rem; height: .72rem; flex: 0 0 auto; border-radius: 999px; background: #67e8f9; box-shadow: 0 0 0 0 rgba(103,232,249,.55); animation: ha-pulse 1.35s ease-out infinite; }
.ha-skeleton-card { pointer-events: none; }
.ha-skeleton { position: relative; overflow: hidden; border-radius: 0; background: color-mix(in srgb, var(--color-muted-foreground) 16%, transparent); }
.ha-skeleton:after { content: ""; position: absolute; inset: 0; transform: translateX(-100%); background: linear-gradient(90deg, transparent, rgba(255,255,255,.14), transparent); animation: ha-shimmer 1.35s infinite; }
.ha-skeleton-stack { display: flex; flex-direction: column; gap: .45rem; padding-top: .15rem; }
.ha-skeleton-icon { width: 2.9rem; height: 2.9rem; }
.ha-skeleton-title { width: 72%; height: .95rem; }
.ha-skeleton-meta { width: 45%; height: .65rem; }
.ha-skeleton-badge { width: 4.4rem; height: 1.05rem; }
.ha-skeleton-badge-short { width: 3.6rem; }
.ha-skeleton-line { height: .78rem; width: 92%; }
.ha-skeleton-line-short { width: 68%; }
.ha-skeleton-criteria { height: 2.2rem; width: 100%; border: 1px solid color-mix(in srgb, var(--color-muted-foreground) 18%, var(--color-border)); }
.ha-skeleton-evidence { width: 58%; height: .8rem; }
.ha-skeleton-progress { flex: 1; height: .48rem; }
.ha-skeleton-progress-text { width: 4.6rem; height: .75rem; }
.ha-skeleton-stat-value { width: 56%; height: 1.35rem; margin-top: .55rem; }
.ha-skeleton-stat-hint { width: 76%; height: .7rem; margin-top: .55rem; }
.ha-loading-guide p { color: var(--color-muted-foreground); }
@keyframes ha-shimmer { 100% { transform: translateX(100%); } }
@keyframes ha-pulse { 0% { box-shadow: 0 0 0 0 rgba(103,232,249,.48); } 70% { box-shadow: 0 0 0 .65rem rgba(103,232,249,0); } 100% { box-shadow: 0 0 0 0 rgba(103,232,249,0); } }
@keyframes ha-fade-in { from { opacity: 0; transform: translateY(3px); } to { opacity: 1; transform: translateY(0); } }
.ha-loading-hero p, .ha-scan-status p, .ha-loading-guide p { text-transform: none; letter-spacing: normal; }
/* In-progress scan banner shown on the main page while the background scan
* is still walking through session history, so the user sees continuous
* progress (X / Y sessions · Z%) instead of guessing whether anything is
* happening. Reuses .ha-scan-pulse + ha-pulse keyframes from the loading page.
*/
.ha-scan-banner { display: flex; flex-direction: column; gap: .6rem; border: 1px solid color-mix(in srgb, #67e8f9 35%, var(--color-border)); background: color-mix(in srgb, var(--color-card) 78%, transparent); padding: .8rem .95rem; animation: ha-fade-in .18s ease-out; }
.ha-scan-banner-head { display: flex; align-items: center; gap: .8rem; }
.ha-scan-banner-text strong { display: block; font-size: .82rem; text-transform: uppercase; letter-spacing: .1em; font-family: var(--font-mono, ui-monospace, monospace); color: var(--color-foreground); }
.ha-scan-banner-text p { margin: .25rem 0 0; font-size: .78rem; line-height: 1.35; color: var(--color-muted-foreground); text-transform: none; letter-spacing: normal; }
.ha-scan-progress-track { height: .4rem; border: 1px solid color-mix(in srgb, #67e8f9 28%, var(--color-border)); background: rgba(0,0,0,.22); overflow: hidden; }
.ha-scan-progress-fill { height: 100%; background: linear-gradient(90deg, #67e8f9, color-mix(in srgb, #67e8f9 48%, white)); transition: width .4s ease-out; }

View File

@ -0,0 +1,11 @@
{
"name": "hermes-achievements",
"label": "Achievements",
"description": "Steam-style achievements for vibe coding and agentic Hermes workflows.",
"icon": "Star",
"version": "0.3.1",
"tab": { "path": "/achievements", "position": "after:analytics" },
"entry": "dist/index.js",
"css": "dist/style.css",
"api": "plugin_api.py"
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,157 @@
# Hermes Achievements Performance Implementation Plan
Status: Ready for execution after hackathon review window
Constraint: Plugin remains frozen until judging is complete
Decision: `/overview` and top-banner slots are out of scope and will be removed.
---
## Phase 0 — Baseline & Safety (no behavior change)
### Task 0.1: Add perf benchmark script (local)
Objective: Repro baseline before/after.
Acceptance:
- Can print endpoint timings for `/achievements` (3 runs each, cold + warm).
### Task 0.2: Define acceptance thresholds
Objective: Lock success criteria now.
Acceptance:
- Documented SLOs:
- `/achievements` p95 < 1s (cached)
- max active scan jobs = 1
---
## Phase 1 — Remove unused overview/slot surface (highest certainty)
### Task 1.1: Remove `/overview` backend route
Objective: Eliminate duplicate heavy endpoint path.
Acceptance:
- `plugin_api.py` no longer exposes `/overview`.
### Task 1.2: Remove slot registration and SummarySlot frontend code
Objective: Remove cross-tab banner fetch behavior.
Acceptance:
- No `registerSlot(..."sessions:top"...)` or `registerSlot(..."analytics:top"...)`.
- No frontend call to `api("/overview")`.
### Task 1.3: Update plugin manifest
Objective: Reflect final UI scope.
Acceptance:
- `manifest.json` removes `slots` declarations.
- Tab registration remains intact.
---
## Phase 2 — Shared snapshot persistence + single-flight for `/achievements`
### Task 2.1: Introduce snapshot store abstraction + on-disk persistence
Objective: Single source of truth for Achievements data that survives process restarts.
Acceptance:
- One structure contains dataset consumed by `/achievements`.
- Repeated requests do not recompute when cache is fresh.
- Snapshot persisted at `~/.hermes/plugins/hermes-achievements/scan_snapshot.json`.
### Task 2.2: Single-flight scan coordinator
Objective: Prevent concurrent recomputes.
Acceptance:
- Simultaneous requests result in one compute run.
### Task 2.3: Refactor `/achievements` to read snapshot
Objective: Remove direct repeated compute from request path.
Acceptance:
- `/achievements` does not run independent full recompute per request when cache is valid.
---
## Phase 3 — Stale-While-Revalidate
### Task 3.1: TTL state (`FRESH`/`STALE`)
Objective: Serve immediately when stale, refresh in background.
Acceptance:
- Cached response returned quickly even when expired.
- Refresh is asynchronous.
### Task 3.2: Add `scan-status` endpoint (optional)
Objective: Let UI/ops inspect scan state.
Acceptance:
- Returns state, last success time, last duration, last error.
### Task 3.3: Add metadata fields to `/achievements`
Objective: Improve transparency.
Acceptance:
- Response includes `generated_at`, `is_stale`, maybe `scan_id`.
---
## Phase 4 — Incremental Scanning (optional but recommended)
### Task 4.1: Add per-session checkpoint file
Objective: Track session-level changes, not just global scan time.
Acceptance:
- Checkpoint persisted at `~/.hermes/plugins/hermes-achievements/scan_checkpoint.json`.
- For each session: `session_id`, fingerprint (`updated_at`/message_count/hash), and cached contribution.
### Task 4.2: Incremental aggregation
Objective: Recompute only changed/new sessions and reuse unchanged contributions.
Acceptance:
- Typical refresh time drops materially below full scan.
- Aggregate rebuild uses: subtract old contribution + add new contribution for changed sessions.
### Task 4.3: Full rebuild fallback
Objective: Preserve correctness.
Acceptance:
- Manual full rescan always possible.
- Schema/version changes invalidate checkpoint and force full rebuild.
---
## Test Plan
1. Unit tests
- Snapshot lifecycle transitions
- Dedupe logic under parallel requests
- `/achievements` response compatibility
2. Integration tests
- Opening Achievements repeatedly causes <=1 heavy scan while in-flight
- `/achievements` warm-cache load is fast
- manual rescan updates snapshot and timestamps
3. Manual benchmarks
- Compare pre/post `/achievements` timings with same history dataset
---
## Rollout Plan
1. Release internal branch with Phase 1 (remove overview/slots).
2. Validate no UI regression in Achievements tab.
3. Add Phase 2 snapshot/dedupe.
4. Add Phase 3 stale-while-revalidate + status metadata.
5. Optional: incremental scanner.
Rollback: keep old compute path behind temporary feature flag for one release window.
---
## Definition of Done
- Achievements tab remains fully functional (counts, latest, tiers, cards, filters).
- No `/overview` endpoint or slot calls remain.
- Repeated Achievements loads feel immediate after warm cache.
- Metrics/unlocks remain unchanged versus baseline.

View File

@ -0,0 +1,219 @@
# Hermes Achievements Implementation Spec (Detailed)
This document is implementation-facing detail to execute the performance refactor later.
Decision scope: keep only Achievements tab flow; remove `/overview` + top-banner slot integration.
---
## A) Current Behavior Summary
- `evaluate_all()` performs:
- full `scan_sessions()`
- `SessionDB.list_sessions_rich(...)`
- `db.get_messages(session_id)` for each session
- text/tool regex analysis + aggregation + evaluation
- `/overview` and `/achievements` both currently call `evaluate_all()` directly.
- slot calls (`sessions:top`, `analytics:top`) currently invoke `/overview`.
Consequence: repeated full recomputes and contention.
---
## B) De-scope/Removal Changes
1. Remove backend route:
- `GET /overview`
2. Remove frontend slot usage:
- `SummarySlot` component
- `registerSlot("sessions:top")`
- `registerSlot("analytics:top")`
3. Remove manifest slot declarations:
- `"slots": ["sessions:top", "analytics:top"]`
4. Keep:
- tab route/page for Achievements
- `/achievements` endpoint and full tab rendering
---
## C) Target Internal Interfaces
### 1) `SnapshotStore`
Responsibilities:
- hold latest computed snapshot in memory
- persist/load snapshot from disk
- expose age and staleness checks
Storage path:
- `~/.hermes/plugins/hermes-achievements/scan_snapshot.json`
Methods (conceptual):
- `get()` -> snapshot | null
- `set(snapshot)`
- `is_stale(ttl_seconds)`
### 2) `ScanCoordinator`
Responsibilities:
- single-flight guard for compute jobs
- track scan status
Methods:
- `run_if_needed(force: bool = false)`
- `get_status()`
State fields:
- `state`: `idle|running|failed`
- `started_at`, `finished_at`
- `last_error`
- `run_count`
### 3) `build_snapshot()`
Responsibilities:
- execute current compute logic once
- on first run, perform full scan and materialize per-session contributions
- on subsequent runs, process only changed/new sessions via checkpoint fingerprints
- produce shape consumed by `/achievements`
Output:
- `achievements`
- count fields
- optional `scan_meta`
---
## D) Endpoint Behavior Matrix (No `/overview`)
| Endpoint | Cache fresh | Cache stale | No cache | Force rescan |
|---|---|---|---|---|
| `/achievements` | return cached | return stale + trigger bg refresh | blocking bootstrap scan | n/a |
| `/rescan` | trigger refresh | trigger refresh | trigger refresh | yes |
| `/scan-status` | status only | status only | status only | status only |
Notes:
- At most one scan run active.
- Other callers either await same run or receive stale snapshot according to policy.
---
## E) Data Shape (Proposed)
```json
{
"generated_at": 0,
"is_stale": false,
"scan_meta": {
"duration_ms": 0,
"sessions_scanned": 0,
"messages_scanned": 0,
"mode": "full",
"error": null
},
"achievements": [],
"unlocked_count": 0,
"discovered_count": 0,
"secret_count": 0,
"total_count": 0,
"error": null
}
```
Compatibility guidance:
- Keep existing `/achievements` keys.
- Add metadata keys without breaking old callers.
Checkpoint file (new):
- `~/.hermes/plugins/hermes-achievements/scan_checkpoint.json`
Suggested checkpoint shape:
```json
{
"schema_version": 1,
"generated_at": 0,
"sessions": {
"<session_id>": {
"fingerprint": {
"updated_at": 0,
"message_count": 0,
"hash": "optional"
},
"contribution": {
"metrics": {}
}
}
}
}
```
Notes:
- fingerprint mismatch => recompute that session contribution only.
- unchanged fingerprint => reuse stored contribution.
---
## F) Concurrency Contract
- Any request path that needs fresh data must pass through single-flight coordinator.
- If a scan is running:
- do not start second scan
- either await in-flight run (bounded) or serve stale snapshot immediately
- lock scope must include scan start/finish state transitions.
---
## G) Error Handling Contract
- If refresh fails and prior snapshot exists:
- return prior snapshot with `is_stale=true` and error metadata
- If refresh fails and no prior snapshot:
- return explicit error response (current behavior equivalent)
- `scan-status` should always return last known state/error.
---
## H) Frontend Integration Contract
- Achievements page:
- one fetch on mount to `/achievements`
- optional background refresh indicator if stale
- no top-banner slot integration
- avoid duplicate in-flight calls during fast navigation by cancellation/debounce.
---
## I) Validation Checklist
- [ ] `/overview` route removed
- [ ] manifest has no `sessions:top`/`analytics:top` slots
- [ ] frontend has no `api("/overview")` calls
- [ ] repeated Achievements navigation does not create multiple heavy scans
- [ ] average warm load times meet SLOs
- [ ] unlock totals match pre-refactor baseline for same history
- [ ] no schema regression in `/achievements` response
---
## J) Suggested File Placement for Future Work
- backend changes: `dashboard/plugin_api.py`
- optional extraction:
- `dashboard/perf_snapshot.py`
- `dashboard/perf_scan_coordinator.py`
- frontend request hygiene: `dashboard/dist/index.js` (or source if available)
- plugin metadata: `dashboard/manifest.json`
- persisted runtime files:
- `~/.hermes/plugins/hermes-achievements/state.json` (existing unlock state)
- `~/.hermes/plugins/hermes-achievements/scan_snapshot.json` (new)
- `~/.hermes/plugins/hermes-achievements/scan_checkpoint.json` (new)
---
## K) Post-Implementation Reporting Template
Record:
- dataset size (sessions/messages/tool calls)
- pre/post `/achievements` timings (cold/warm)
- whether single-flight dedupe triggered under repeated tab open
- any behavioral diffs in unlock counts

View File

@ -0,0 +1,174 @@
# Hermes Achievements Performance Spec (Post-Hackathon)
Status: Draft (no code changes yet)
Owner: hermes-achievements plugin
Scope: `dashboard/plugin_api.py` + `dashboard/dist/index.js` request behavior
Decision: **Drop `/overview` and top-banner slots**; keep only Achievements tab data path.
---
## 1) Problem Statement
Current plugin endpoints `/achievements` and `/overview` both execute a full history recomputation (`evaluate_all()`), which performs a full SessionDB scan each request.
Observed on this machine/repo:
- ~83 sessions
- ~7,125 messages
- ~3,623 tool calls
- `evaluate_all()` ~1316s per call
- `/achievements` ~1315s per call
- `/overview` ~1215s per call
- Overlap between endpoints increases perceived wait.
Given current product direction, `/overview` and cross-tab top-banner slots are not needed.
---
## 2) Goals
- Keep achievement correctness unchanged.
- Keep all Achievements-tab UX/data (unlocked/discovered/secrets/highest/latest/cards).
- Remove unused summary path (`/overview`) and slot wiring.
- Make Achievements tab faster by avoiding duplicate endpoint pathways.
- Ensure at most one heavy scan can run at a time.
Non-goals (phase 1):
- Rewriting achievement rules.
- Changing badge semantics/states.
---
## 3) Endpoint Semantics (Target)
### `GET /api/plugins/hermes-achievements/achievements`
Single source endpoint for Achievements UI.
Returns full payload used by the tab:
- `achievements`
- `unlocked_count`
- `discovered_count`
- `secret_count`
- `total_count`
- `error`
### `POST /api/plugins/hermes-achievements/rescan` (optional)
Manual refresh trigger.
Prefer async trigger + immediate status response.
### `GET /api/plugins/hermes-achievements/scan-status` (optional new)
Reports scan state for UX/ops.
### Removed
- `GET /api/plugins/hermes-achievements/overview`
---
## 4) UI Scope (Target)
Keep:
- Achievements page/tab (`/achievements` in plugin tab manifest)
- All existing Achievements tab stats/cards/filters
Remove:
- Top-banner summary slot components using `sessions:top` and `analytics:top`
- Any frontend call path to `/overview`
---
## 5) Runtime State Machine (for `/achievements`)
- `FRESH`: cached snapshot age <= TTL
- `STALE`: snapshot exists but expired
- `SCANNING`: background recompute running
- `FAILED`: last recompute failed, last good snapshot still served
Rules:
1. FRESH -> serve immediately.
2. STALE + not scanning -> serve stale snapshot immediately and launch background refresh.
3. SCANNING -> do not start another scan; join single-flight in-flight job.
4. No snapshot yet -> allow one blocking bootstrap scan.
---
## 6) Caching & Invalidation
### Phase 1
- In-memory cache + persisted snapshot file.
- TTL: 60180 seconds (configurable).
- Single-flight dedupe for scan requests.
- Persist plugin data under:
- `~/.hermes/plugins/hermes-achievements/scan_snapshot.json`
### Phase 2
- Incremental scan checkpoints with per-session fingerprints.
- Persist checkpoint data under:
- `~/.hermes/plugins/hermes-achievements/scan_checkpoint.json`
- Checkpoint stores, per session:
- `session_id`
- fingerprint (`updated_at`, message_count, or hash)
- cached per-session contribution used for aggregate recomposition
- Scan policy:
- First run: full scan and materialize snapshot + checkpoint.
- Next runs: process only new/changed sessions, reuse unchanged contributions.
- Full rebuild only on:
- schema/version change
- checkpoint corruption
- explicit full rescan
---
## 7) Frontend Contract
- Achievements tab requests `/achievements` once on mount.
- No slot-based summary fetches.
- If response says `is_stale=true`, UI may display “Updating in background”.
- Avoid duplicate mount-triggered calls and cancel stale requests on navigation.
---
## 8) SLO Targets
- `/achievements` p95 < 1s (cached)
- Max concurrent heavy scans: 1
- Background refresh should not block UI
---
## 9) Observability Requirements
Track:
- scan count
- scan duration avg/p95
- dedupe hit count (joined in-flight scans)
- stale-served count
- failures + last error
Expose minimal diagnostics in `/scan-status`.
---
## 10) Backward Compatibility
- Keep `/achievements` response shape backward-compatible.
- Removing `/overview` is acceptable because slot UI is intentionally removed.
- If temporary compatibility is needed, `/overview` can return static deprecation response for one release.
---
## 11) Risks
- Stale data confusion -> mitigate with `generated_at` and explicit refresh status.
- Cache invalidation bugs -> start with conservative TTL + manual rescan.
- Concurrency bugs -> protect scan section with lock/single-flight guard.
- Session mutation edge cases -> use per-session fingerprint invalidation (not global timestamp only).
---
## 12) Persistence Files (Explicit)
Plugin state directory:
- `~/.hermes/plugins/hermes-achievements/`
Files:
- `state.json` (existing): unlock tracking
- `scan_snapshot.json` (new): latest materialized achievements payload
- `scan_checkpoint.json` (new): per-session fingerprints + contributions for incremental refresh

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

View File

@ -0,0 +1,156 @@
import importlib.util
import unittest
from pathlib import Path
MODULE_PATH = Path(__file__).resolve().parents[1] / "dashboard" / "plugin_api.py"
spec = importlib.util.spec_from_file_location("plugin_api", MODULE_PATH)
plugin_api = importlib.util.module_from_spec(spec)
spec.loader.exec_module(plugin_api)
class AchievementEngineTests(unittest.TestCase):
def test_tool_call_stats_detect_tool_names_and_errors(self):
messages = [
{"role": "assistant", "tool_calls": [{"function": {"name": "terminal"}}]},
{"role": "tool", "tool_name": "terminal", "content": "Error: port 3000 already in use"},
{"role": "assistant", "tool_calls": [{"function": {"name": "web_search"}}]},
]
stats = plugin_api.analyze_messages("s1", "Fix dev server", messages)
self.assertEqual(stats["tool_call_count"], 2)
self.assertEqual(stats["tool_names"], {"terminal", "web_search"})
self.assertEqual(stats["error_count"], 1)
self.assertIs(stats["port_conflict"], True)
def test_tiered_achievement_reaches_highest_matching_tier(self):
definition = {
"id": "let_him_cook",
"threshold_metric": "max_tool_calls_in_session",
"tiers": [
{"name": "Copper", "threshold": 10},
{"name": "Silver", "threshold": 25},
{"name": "Gold", "threshold": 50},
],
}
aggregate = {"max_tool_calls_in_session": 28}
result = plugin_api.evaluate_tiered(definition, aggregate)
self.assertIs(result["unlocked"], True)
self.assertEqual(result["tier"], "Silver")
self.assertEqual(result["progress"], 28)
self.assertEqual(result["next_tier"], "Gold")
def test_tiered_achievement_can_be_discovered_without_unlocking(self):
definition = {
"id": "terminal_goblin",
"threshold_metric": "total_terminal_calls",
"tiers": [{"name": "Copper", "threshold": 50}],
}
aggregate = {"total_terminal_calls": 12}
result = plugin_api.evaluate_tiered(definition, aggregate)
self.assertIs(result["unlocked"], False)
self.assertIs(result["discovered"], True)
self.assertEqual(result["state"], "discovered")
self.assertEqual(result["progress"], 12)
self.assertEqual(result["next_threshold"], 50)
def test_secret_achievement_stays_hidden_without_progress(self):
definition = {
"id": "permission_denied_any_percent",
"name": "Permission Denied Any%",
"secret": True,
"requirements": [{"metric": "permission_denied_events", "gte": 3}],
}
aggregate = {"permission_denied_events": 0}
result = plugin_api.evaluate_requirements(definition, aggregate)
display = plugin_api.display_achievement({**definition, **result})
self.assertEqual(result["state"], "secret")
self.assertEqual(display["name"], "???")
self.assertNotIn("Permission", display["description"])
def test_multi_condition_unlock_requires_all_requirements(self):
definition = {
"id": "full_send",
"requirements": [
{"metric": "max_terminal_calls_in_session", "gte": 10},
{"metric": "max_file_tool_calls_in_session", "gte": 5},
{"metric": "max_web_calls_in_session", "gte": 2},
],
}
partial = plugin_api.evaluate_requirements(definition, {
"max_terminal_calls_in_session": 12,
"max_file_tool_calls_in_session": 2,
"max_web_calls_in_session": 0,
})
complete = plugin_api.evaluate_requirements(definition, {
"max_terminal_calls_in_session": 12,
"max_file_tool_calls_in_session": 6,
"max_web_calls_in_session": 2,
})
self.assertEqual(partial["state"], "discovered")
self.assertIs(partial["unlocked"], False)
self.assertLess(partial["progress_pct"], 100)
self.assertEqual(complete["state"], "unlocked")
self.assertIs(complete["unlocked"], True)
def test_catalog_has_60_plus_unique_achievements(self):
ids = [achievement["id"] for achievement in plugin_api.ACHIEVEMENTS]
self.assertGreaterEqual(len(ids), 60)
self.assertEqual(len(ids), len(set(ids)))
def test_model_provider_metrics_are_aggregated(self):
sessions = [
{"model_names": {"openai/gpt-5", "anthropic/claude-sonnet-4"}},
{"model_names": {"google/gemini-pro", "mistral/large"}},
{"model_names": {"qwen/qwen3"}},
]
aggregate = plugin_api.aggregate_stats(sessions)
self.assertEqual(aggregate["distinct_model_count"], 5)
self.assertEqual(aggregate["distinct_provider_count"], 5)
result = plugin_api.evaluate_definition(
next(a for a in plugin_api.ACHIEVEMENTS if a["id"] == "five_model_flight"),
aggregate,
)
self.assertEqual(result["state"], "unlocked")
self.assertEqual(result["tier"], "Copper")
def test_removed_noisy_achievements_are_not_in_catalog(self):
ids = {achievement["id"] for achievement in plugin_api.ACHIEVEMENTS}
self.assertNotIn("fallback_pilot", ids)
self.assertNotIn("browser_sleuth", ids)
self.assertNotIn("release_ritualist", ids)
def test_open_weights_pilgrim_counts_only_local_model_metadata(self):
aggregate_mentions_only = plugin_api.aggregate_stats([
{"model_names": {"openai/gpt-5"}, "local_model_events": 999},
])
aggregate_local_chat = plugin_api.aggregate_stats([
{"model_names": {"openai/gpt-5"}},
{"model_names": {"ollama/llama3"}},
])
definition = next(a for a in plugin_api.ACHIEVEMENTS if a["id"] == "open_weights_pilgrim")
self.assertEqual(aggregate_mentions_only["local_model_chat_sessions"], 0)
self.assertEqual(plugin_api.evaluate_definition(definition, aggregate_mentions_only)["state"], "discovered")
self.assertEqual(aggregate_local_chat["local_model_chat_sessions"], 1)
self.assertEqual(plugin_api.evaluate_definition(definition, aggregate_local_chat)["state"], "unlocked")
def test_config_surgeon_ignores_generic_config_mentions(self):
stats = plugin_api.analyze_messages("s1", "Config talk", [{"content": "config config configuration not configured"}])
self.assertEqual(stats["config_events"], 0)
stats = plugin_api.analyze_messages("s2", "Real config", [{"content": "edited config.yaml, manifest.json, and .env.local"}])
self.assertGreaterEqual(stats["config_events"], 3)
if __name__ == "__main__":
unittest.main()

View File

@ -0,0 +1,366 @@
"""Tests for the bundled hermes-achievements dashboard plugin.
These target the two behaviors that matter for official integration:
* The 200-session scan cap is removed the plugin now walks the entire
session history by default. Lifetime badges (tens of thousands of
tool calls) were unreachable before this fix on long-running installs.
* First-ever scans run in a background thread so the dashboard request
path never blocks, even on 8000+ session databases where a cold scan
takes minutes.
The upstream repo ships its own unittest suite under
``plugins/hermes-achievements/tests/`` covering the achievement engine
internals (tier math, secret-state handling, catalog invariants). These
tests live at the hermes-agent level and focus on the integration
contract: the plugin scans ALL of your sessions, not the first 200.
"""
from __future__ import annotations
import importlib.util
import sys
import threading
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
import pytest
PLUGIN_MODULE_PATH = (
Path(__file__).resolve().parents[2]
/ "plugins"
/ "hermes-achievements"
/ "dashboard"
/ "plugin_api.py"
)
@pytest.fixture
def plugin_api(tmp_path, monkeypatch):
"""Load plugin_api with isolated ~/.hermes so state/snapshot files don't collide.
We load the module fresh per test because the plugin keeps module-level
caches (``_SNAPSHOT_CACHE``, ``_SCAN_STATUS``, background thread handle).
Reloading gives each test a clean world.
"""
monkeypatch.setattr(Path, "home", lambda: tmp_path)
spec = importlib.util.spec_from_file_location(
f"plugin_api_test_{id(tmp_path)}", PLUGIN_MODULE_PATH
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
yield module
class _FakeSessionDB:
"""Stand-in for hermes_state.SessionDB that records scan calls."""
def __init__(self, session_count: int):
self.session_count = session_count
self.last_limit: Optional[int] = None
self.last_include_children: Optional[bool] = None
self.list_calls = 0
self.messages_calls = 0
def list_sessions_rich(
self,
source: Optional[str] = None,
exclude_sources: Optional[List[str]] = None,
limit: int = 20,
offset: int = 0,
include_children: bool = False,
project_compression_tips: bool = True,
) -> List[Dict[str, Any]]:
self.last_limit = limit
self.last_include_children = include_children
self.list_calls += 1
# SQLite semantics: LIMIT -1 = unlimited. Honor that here.
effective = self.session_count if limit == -1 else min(self.session_count, limit)
now = int(time.time())
return [
{
"id": f"sess-{i}",
"title": f"Session {i}",
"preview": f"preview {i}",
"started_at": now - (self.session_count - i) * 60,
"last_active": now - (self.session_count - i) * 60 + 30,
"source": "cli",
"model": "test-model",
}
for i in range(effective)
]
def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
self.messages_calls += 1
return [
{"role": "user", "content": f"ask {session_id}"},
{
"role": "assistant",
"tool_calls": [{"function": {"name": "terminal"}}],
},
{"role": "tool", "tool_name": "terminal", "content": "ok"},
]
def close(self) -> None:
pass
def _install_fake_session_db(plugin_api, fake_db):
"""Inject a fake SessionDB so ``scan_sessions`` finds it via its local import."""
fake_module = type(sys)("hermes_state")
fake_module.SessionDB = lambda: fake_db
sys.modules["hermes_state"] = fake_module
def test_scan_sessions_default_scans_all_history_not_first_200(plugin_api):
"""Bug regression: ``scan_sessions()`` used to cap at limit=200.
A user with 8000+ sessions would only see ~2% of their history in
achievement totals, making lifetime badges unreachable. The default
now passes ``LIMIT -1`` (SQLite "unlimited") to ``list_sessions_rich``.
"""
fake_db = _FakeSessionDB(session_count=500) # > old 200 cap
_install_fake_session_db(plugin_api, fake_db)
result = plugin_api.scan_sessions()
assert fake_db.last_limit == -1, (
"scan_sessions() must pass LIMIT=-1 (unlimited) to list_sessions_rich "
f"by default, got {fake_db.last_limit}"
)
assert fake_db.last_include_children is True, (
"scan_sessions() must include subagent/compression child sessions so "
"tool calls made in delegated agents still count toward achievements"
)
assert len(result["sessions"]) == 500
assert result["scan_meta"]["sessions_total"] == 500
def test_scan_sessions_explicit_positive_limit_is_honored(plugin_api):
"""Callers can still pass a small limit for smoke tests."""
fake_db = _FakeSessionDB(session_count=500)
_install_fake_session_db(plugin_api, fake_db)
result = plugin_api.scan_sessions(limit=10)
assert fake_db.last_limit == 10
assert len(result["sessions"]) == 10
def test_scan_sessions_zero_or_negative_limit_means_unlimited(plugin_api):
"""``limit=0`` and ``limit=-1`` both map to the unlimited path."""
fake_db = _FakeSessionDB(session_count=300)
_install_fake_session_db(plugin_api, fake_db)
plugin_api.scan_sessions(limit=0)
assert fake_db.last_limit == -1
plugin_api.scan_sessions(limit=-1)
assert fake_db.last_limit == -1
def test_evaluate_all_first_run_returns_pending_and_starts_background_scan(plugin_api):
"""First-ever evaluate_all with no cache returns a pending placeholder
immediately and kicks off a background scan thread. Cold scans on
large DBs take minutes blocking the dashboard request path is not
acceptable.
"""
fake_db = _FakeSessionDB(session_count=50)
_install_fake_session_db(plugin_api, fake_db)
# Wrap _run_scan_and_update_cache so we can release it on demand,
# simulating a slow cold scan without actually waiting.
scan_started = threading.Event()
allow_scan_finish = threading.Event()
original_run = plugin_api._run_scan_and_update_cache
def gated_run(*args, **kwargs):
scan_started.set()
allow_scan_finish.wait(timeout=5)
original_run(*args, **kwargs)
plugin_api._run_scan_and_update_cache = gated_run
t0 = time.time()
result = plugin_api.evaluate_all()
elapsed = time.time() - t0
# Immediate return — should not block waiting for the scan.
assert elapsed < 1.0, f"evaluate_all blocked for {elapsed:.2f}s on first run"
assert result["scan_meta"]["mode"] == "pending"
assert result["unlocked_count"] == 0
# Catalog still rendered so UI has something to draw.
assert result["total_count"] >= 60
# Background scan is running.
assert scan_started.wait(timeout=2), "background scan did not start"
# Let the scan complete, then a second call returns real data.
allow_scan_finish.set()
# Wait for thread to finish.
thread = plugin_api._BACKGROUND_SCAN_THREAD
assert thread is not None
thread.join(timeout=5)
assert not thread.is_alive()
second = plugin_api.evaluate_all()
assert second["scan_meta"]["mode"] != "pending"
assert second["scan_meta"].get("sessions_total") == 50
def test_evaluate_all_stale_cache_serves_stale_and_refreshes_in_background(plugin_api):
"""When the snapshot is on-disk but older than TTL, evaluate_all returns
the stale data immediately and kicks a background refresh. Users don't
stare at a loading spinner every time TTL expires.
"""
fake_db = _FakeSessionDB(session_count=10)
_install_fake_session_db(plugin_api, fake_db)
# Seed a stale snapshot on disk.
stale_generated_at = int(time.time()) - plugin_api.SNAPSHOT_TTL_SECONDS - 60
stale_payload = {
"achievements": [],
"sessions": [],
"aggregate": {},
"scan_meta": {"mode": "full", "sessions_total": 1, "sessions_rescanned": 1, "sessions_reused": 0},
"error": None,
"unlocked_count": 0,
"discovered_count": 0,
"secret_count": 0,
"total_count": 0,
"generated_at": stale_generated_at,
}
plugin_api.save_snapshot(stale_payload)
t0 = time.time()
result = plugin_api.evaluate_all()
elapsed = time.time() - t0
assert elapsed < 1.0, f"evaluate_all blocked for {elapsed:.2f}s serving stale data"
assert result["generated_at"] == stale_generated_at
# Background scan should be running or have completed.
thread = plugin_api._BACKGROUND_SCAN_THREAD
assert thread is not None
thread.join(timeout=5)
fresh = plugin_api.evaluate_all()
assert fresh["generated_at"] >= stale_generated_at
def test_evaluate_all_force_runs_synchronously(plugin_api):
"""Manual /rescan (force=True) blocks the caller — users clicking
the rescan button expect up-to-date data when the call returns.
"""
fake_db = _FakeSessionDB(session_count=25)
_install_fake_session_db(plugin_api, fake_db)
result = plugin_api.evaluate_all(force=True)
# Synchronous — snapshot is fresh on return.
assert result["scan_meta"].get("sessions_total") == 25
assert result["scan_meta"]["mode"] in ("full", "incremental")
def test_start_background_scan_is_idempotent_while_running(plugin_api):
"""Multiple concurrent dashboard requests must not spawn duplicate scans."""
fake_db = _FakeSessionDB(session_count=5)
_install_fake_session_db(plugin_api, fake_db)
release = threading.Event()
original_run = plugin_api._run_scan_and_update_cache
def gated_run(*args, **kwargs):
release.wait(timeout=5)
original_run(*args, **kwargs)
plugin_api._run_scan_and_update_cache = gated_run
plugin_api._start_background_scan()
first_thread = plugin_api._BACKGROUND_SCAN_THREAD
assert first_thread is not None and first_thread.is_alive()
plugin_api._start_background_scan()
plugin_api._start_background_scan()
assert plugin_api._BACKGROUND_SCAN_THREAD is first_thread
release.set()
first_thread.join(timeout=5)
def test_background_scan_publishes_partial_snapshots(plugin_api):
"""The background scanner publishes intermediate snapshots to the cache
every ~N sessions. Each dashboard refresh during a long cold scan sees
more badges unlocked instead of staring at zeros for minutes and then
having everything pop at the end.
"""
fake_db = _FakeSessionDB(session_count=750)
_install_fake_session_db(plugin_api, fake_db)
# Record every partial snapshot the scanner publishes.
partial_snapshots: List[Dict[str, Any]] = []
original_compute_from_scan = plugin_api._compute_from_scan
def recording_compute(scan, *, is_partial=False):
result = original_compute_from_scan(scan, is_partial=is_partial)
if is_partial:
partial_snapshots.append(result)
return result
plugin_api._compute_from_scan = recording_compute
# scan 750 sessions with progress_every=250 → expect 2 intermediate
# publications (at 250 and 500; the final 750 call goes through the
# finished, non-partial path).
plugin_api._run_scan_and_update_cache(publish_partial_snapshots=True)
assert len(partial_snapshots) >= 2, (
f"expected at least 2 partial publications on a 750-session scan with "
f"progress_every=250, got {len(partial_snapshots)}"
)
# Partial snapshots should report growing session counts.
counts = [p["scan_meta"].get("sessions_scanned_so_far") for p in partial_snapshots]
assert counts == sorted(counts), f"partial session counts not monotonic: {counts}"
assert counts[0] < 750 and counts[-1] < 750, (
f"partial counts should be less than the final total; got {counts}"
)
# Every partial reports the expected end-state total so the UI can
# show an accurate progress bar.
for p in partial_snapshots:
assert p["scan_meta"].get("sessions_expected_total") == 750
# Final snapshot in cache is the real (non-partial) one.
final = plugin_api._SNAPSHOT_CACHE
assert final is not None
assert final["scan_meta"].get("mode") != "in_progress"
assert final["scan_meta"].get("sessions_total") == 750
def test_partial_snapshots_do_not_persist_unlock_timestamps(plugin_api):
"""Intermediate snapshots must not write to state.json — an unlock
that appears at 30% scan progress could disappear when a later session
rebalances the aggregate. Only the final snapshot records ``unlocked_at``.
"""
fake_db = _FakeSessionDB(session_count=10)
_install_fake_session_db(plugin_api, fake_db)
# Seed empty state, then invoke partial compute directly.
plugin_api.save_state({"unlocks": {}})
partial_scan = {
"sessions": [{"session_id": "x", "tool_call_count": 99999, "tool_names": set()}],
"aggregate": {"max_tool_calls_in_session": 99999, "total_tool_calls": 99999},
"scan_meta": {"mode": "in_progress"},
}
result = plugin_api._compute_from_scan(partial_scan, is_partial=True)
# Some achievements should evaluate as unlocked in this aggregate...
assert any(a["unlocked"] for a in result["achievements"])
# ...but state.json on disk stays empty (no timestamps were recorded).
persisted = plugin_api.load_state()
assert persisted.get("unlocks", {}) == {}, (
"partial scans must not record unlock timestamps — a later session "
"could change whether the badge deserves to be unlocked yet"
)

View File

@ -62,6 +62,7 @@ The repo ships these bundled plugins under `plugins/`. All are opt-in — enable
| `image_gen/openai` | image backend | OpenAI `gpt-image-2` image generation backend (alternative to FAL) |
| `image_gen/openai-codex` | image backend | OpenAI image generation via Codex OAuth |
| `image_gen/xai` | image backend | xAI `grok-2-image` backend |
| `hermes-achievements` | dashboard tab | Steam-style collectible badges generated from your real Hermes session history |
| `example-dashboard` | dashboard example | Reference dashboard plugin for [Extending the Dashboard](./extending-the-dashboard.md) |
| `strike-freedom-cockpit` | dashboard skin | Sample custom dashboard skin |
@ -208,6 +209,57 @@ The agent kicks off the meeting join, streams the transcription back into its co
**Disabling:** `hermes plugins disable google_meet`. Any cached transcripts and recordings stay in `~/.hermes/cache/google_meet/` until you remove them.
### hermes-achievements
Adds a **Steam-style achievements tab to the dashboard** — 60+ collectible, tiered badges generated from your real Hermes session history. Tool-chain feats, debugging patterns, vibe-coding streaks, skill/memory usage, model/provider variety, lifestyle quirks (weekend and night sessions). Originally authored by [@PCinkusz](https://github.com/PCinkusz) as an external plugin; brought in-tree so it stays in lockstep with Hermes feature changes.
**How it works:**
- Scans your entire `~/.hermes/state.db` session history on the dashboard backend
- Per-session stats are cached by `(started_at, last_active)` fingerprint, so only new or changed sessions re-analyze on subsequent scans
- First-ever scan runs in a background thread — the dashboard never blocks waiting for it, even on databases with thousands of sessions
- Unlock state is persisted to `$HERMES_HOME/plugins/hermes-achievements/state.json`
**Tier progression:** Copper → Silver → Gold → Diamond → Olympian. Each card exposes a "What counts" section listing the exact metric being tracked.
**Achievement states:**
| State | Meaning |
|---|---|
| Unlocked | At least one tier achieved |
| Discovered | Known achievement, progress visible, not yet earned |
| Secret | Hidden until Hermes detects the first related signal in your history |
**API** — routes mount under `/api/plugins/hermes-achievements/`:
| Endpoint | Purpose |
|---|---|
| `GET /achievements` | Full catalog with per-badge unlock state (returns a pending placeholder while the first cold scan is running) |
| `GET /scan-status` | State of the background scanner: `idle` / `running` / `failed`, last duration, run count |
| `GET /recent-unlocks` | Twenty most recently unlocked badges, newest first |
| `GET /sessions/{id}/badges` | Badges earned primarily in one specific session |
| `POST /rescan` | Manual synchronous rescan (blocks; use when the user clicks the rescan button) |
| `POST /reset-state` | Clear unlock history and cached snapshot |
**State files** — live under `$HERMES_HOME/plugins/hermes-achievements/`:
| File | Contents |
|---|---|
| `state.json` | Unlock history: which badges you've earned and when. Stable across Hermes updates. |
| `scan_snapshot.json` | Last completed scan payload (served immediately on dashboard load) |
| `scan_checkpoint.json` | Per-session stats cache keyed by fingerprint (makes warm rescans fast) |
**Performance notes:**
- Cold scan on ~8,000 sessions takes a few minutes. It runs in a background thread on first dashboard request; the UI sees a pending placeholder and polls `/scan-status`.
- **Incremental results during a cold scan** — the scanner publishes a partial snapshot every ~250 sessions so each dashboard refresh shows more badges unlocked as the scan progresses. No minute-long stare at zeros.
- Warm rescan reuses per-session stats for every session whose `started_at` + `last_active` fingerprint matches the checkpoint — completes in seconds even on large histories.
- The in-memory snapshot TTL is 120s; stale requests serve the old snapshot immediately and kick a background refresh. You never wait on a spinner just because TTL expired.
**Enabling:** Nothing to enable — `hermes-achievements` is a dashboard-only plugin (no lifecycle hooks, no model-visible tools). It auto-registers as a tab in `hermes dashboard` on first launch. The `plugins.enabled` config only gates lifecycle/tool plugins; dashboard plugins are discovered purely via their `dashboard/manifest.json`.
**Opting out:** Delete or rename `plugins/hermes-achievements/dashboard/manifest.json`, or override it with a user plugin of the same name in `~/.hermes/plugins/hermes-achievements/` that ships no dashboard. The plugin's state files under `$HERMES_HOME/plugins/hermes-achievements/` survive — reinstalling preserves your unlock history.
## Adding a bundled plugin
Bundled plugins are written exactly like any other Hermes plugin — see [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin). The only differences are: