|
Some checks failed
CI / validate (push) Failing after 17s
Per saved memory feedback_runner_config_partial_deploy: orchestrator identified that runners 1-8 last restarted before AGENT_TOOLSDIRECTORY + RUNNER_TOOL_CACHE were added; cycle 7 retrigger landed ~50% on stale runners. Orchestrator restarted 1-8 at ~09:37; this empty commit re-triggers CI on the now-consistent runner pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .github/workflows | ||
| .molecule-ci/scripts | ||
| adapters | ||
| runbooks | ||
| skills/llm-judge | ||
| .gitignore | ||
| CLAUDE.md | ||
| known-issues.md | ||
| plugin.yaml | ||
| README.md | ||
molecule-skill-llm-judge — LLM-as-Judge Gate
Plugin for Claude Code. Scores whether an agent's deliverable (a PR, a delegation result, a generated config) actually addresses the original request — the failure mode unit tests miss.
The problem it solves
Unit tests verify the code ran. They don't verify it did the right thing for the customer's actual request. An agent can implement the wrong solution perfectly.
When to use
After an agent (PM, Dev Lead, QA, etc.) produces a deliverable:
- A PR opened in response to an issue
- A delegation result (A2A
message/sendresponse) - A generated config or template
- A code review they posted
Trigger: "Agent came back with 'done' — before we believe them."
What it does
- Presents the original request and the agent's deliverable to an LLM judge
- Scores: does the deliverable actually address the request?
- Reports: passes, partial, or fails — with evidence
Installation
In org template (org.yaml)
plugins:
- molecule-skill-llm-judge
From URL
github://Molecule-AI/molecule-ai-plugin-molecule-skill-llm-judge
License
Business Source License 1.1 — © Molecule AI.