molecule-core

Author	SHA1	Message	Date
Hongming Wang	8da43984f7	Merge pull request #56 from Molecule-AI/docs/sync-2026-04-14-tick-3 docs: sync documentation with 2026-04-14 tick-3 merges (#53, #54, #55)	2026-04-14 12:24:16 -07:00
Hongming Wang	b123294cf2	Merge pull request #56 from Molecule-AI/docs/sync-2026-04-14-tick-3 docs: sync documentation with 2026-04-14 tick-3 merges (#53, #54, #55)	2026-04-14 12:24:16 -07:00
Hongming Wang	119b02c544	feat(plugins): split guardrails into 12 modular plugins Replaces the proposed monolithic molecule-guardrails plugin with 12 single-purpose plugins users can install à la carte. Powered by a small extension to the AgentskillsAdaptor base class so any plugin can ship hooks/, commands/, and a settings-fragment.json without writing a custom adapter. ## Base adapter changes workspace-template/plugins_registry/builtins.py + sdk/python/molecule_plugin/builtins.py (both copies — drift-tested): - New _install_claude_layer() helper called at the end of install() - Conditionally copies hooks/ → /configs/.claude/hooks/ (preserving exec bit) - Conditionally copies commands/.md → /configs/.claude/commands/ - Conditionally merges settings-fragment.json into /configs/.claude/settings.json with ${CLAUDE_DIR} placeholder rewritten to the workspace's absolute install path. Existing user hooks are preserved (deep-merge by event name). - All steps no-op when the plugin doesn't ship the corresponding files, so existing skill+rule plugins (molecule-dev, superpowers, ecc, browser-automation) are unchanged. Drift test (tests/test_plugins_builtins_drift.py) still passes. ## 12 new plugins Hook plugins (ambient enforcement): - molecule-careful-bash — refuses destructive bash; ships careful-mode skill - molecule-freeze-scope — locks edits via .claude/freeze - molecule-audit-trail — appends every Edit/Write to audit.jsonl - molecule-session-context — auto-loads cron-learnings at session start - molecule-prompt-watchdog — injects warnings on destructive prompt keywords Skill plugins (on-demand): - molecule-skill-code-review — 16-criteria multi-axis review - molecule-skill-cross-vendor-review — adversarial second-model review - molecule-skill-llm-judge — deliverable-vs-request scoring - molecule-skill-update-docs — post-merge doc sync - molecule-skill-cron-learnings — operational-memory JSONL format Workflow plugins (slash commands): - molecule-workflow-triage — /triage full PR-triage cycle - molecule-workflow-retro — /retro + cron-retro skill, weekly retrospective Each ships only what it needs — most have just plugin.yaml + skills/ or hooks/ + adapter (one-line stub: `from plugins_registry.builtins import AgentskillsAdaptor as Adaptor`). Total ~120 files but each plugin is small and self-contained. ## Verification - python3 -m molecule_plugin validate plugins/molecule- → all 13 valid (12 new + pre-existing molecule-dev) - End-to-end install smoke test on representative samples: hook plugin (molecule-careful-bash), skill-only plugin (molecule-skill-code-review), workflow plugin (molecule-workflow-triage). All produce expected /configs/ tree, settings.json paths rewritten, exec bits preserved, zero warnings. - workspace-template pytest tests/test_plugins_builtins_drift.py → passes (SDK + runtime stay in sync). ## CLAUDE.md repo-doc updated Lists all 12 new plugins under the existing Plugins section, organized by category (hook / skill / workflow). Each entry one line, recommend- together hints where dependencies make sense. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:20:04 -07:00
Hongming Wang	90a513d1d0	feat(plugins): split guardrails into 12 modular plugins Replaces the proposed monolithic molecule-guardrails plugin with 12 single-purpose plugins users can install à la carte. Powered by a small extension to the AgentskillsAdaptor base class so any plugin can ship hooks/, commands/, and a settings-fragment.json without writing a custom adapter. ## Base adapter changes workspace-template/plugins_registry/builtins.py + sdk/python/molecule_plugin/builtins.py (both copies — drift-tested): - New _install_claude_layer() helper called at the end of install() - Conditionally copies hooks/ → /configs/.claude/hooks/ (preserving exec bit) - Conditionally copies commands/.md → /configs/.claude/commands/ - Conditionally merges settings-fragment.json into /configs/.claude/settings.json with ${CLAUDE_DIR} placeholder rewritten to the workspace's absolute install path. Existing user hooks are preserved (deep-merge by event name). - All steps no-op when the plugin doesn't ship the corresponding files, so existing skill+rule plugins (molecule-dev, superpowers, ecc, browser-automation) are unchanged. Drift test (tests/test_plugins_builtins_drift.py) still passes. ## 12 new plugins Hook plugins (ambient enforcement): - molecule-careful-bash — refuses destructive bash; ships careful-mode skill - molecule-freeze-scope — locks edits via .claude/freeze - molecule-audit-trail — appends every Edit/Write to audit.jsonl - molecule-session-context — auto-loads cron-learnings at session start - molecule-prompt-watchdog — injects warnings on destructive prompt keywords Skill plugins (on-demand): - molecule-skill-code-review — 16-criteria multi-axis review - molecule-skill-cross-vendor-review — adversarial second-model review - molecule-skill-llm-judge — deliverable-vs-request scoring - molecule-skill-update-docs — post-merge doc sync - molecule-skill-cron-learnings — operational-memory JSONL format Workflow plugins (slash commands): - molecule-workflow-triage — /triage full PR-triage cycle - molecule-workflow-retro — /retro + cron-retro skill, weekly retrospective Each ships only what it needs — most have just plugin.yaml + skills/ or hooks/ + adapter (one-line stub: `from plugins_registry.builtins import AgentskillsAdaptor as Adaptor`). Total ~120 files but each plugin is small and self-contained. ## Verification - python3 -m molecule_plugin validate plugins/molecule- → all 13 valid (12 new + pre-existing molecule-dev) - End-to-end install smoke test on representative samples: hook plugin (molecule-careful-bash), skill-only plugin (molecule-skill-code-review), workflow plugin (molecule-workflow-triage). All produce expected /configs/ tree, settings.json paths rewritten, exec bits preserved, zero warnings. - workspace-template pytest tests/test_plugins_builtins_drift.py → passes (SDK + runtime stay in sync). ## CLAUDE.md repo-doc updated Lists all 12 new plugins under the existing Plugins section, organized by category (hook / skill / workflow). Each entry one line, recommend- together hints where dependencies make sense. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:20:04 -07:00
Hongming Wang	eea36b9f92	feat(.claude): ambient hooks + sequential-thinking MCP + /triage command Skills are opt-in (I have to remember to invoke them). Hooks are ambient — they fire on every matching event automatically. This PR moves the careful-mode and learnings discipline from "doc I should read" to "harness-enforced behavior I cannot bypass". ## 6 new hooks (.claude/hooks/) - pre-bash-careful — REFUSES git push --force to main, rm -rf at root, DROP TABLE against prod schema. WARNs on force-with-lease, gh pr/ issue close. Tested: blocks the destructive case, allows safe ones. - pre-edit-freeze — implements /freeze. When .claude/freeze contains a path glob, edits outside it are denied. Tested: edits to PLAN.md blocked when scope locked to platform/internal/handlers/. - session-start-context — auto-loads last 20 cron-learnings, freeze status, open-PR/issue counts as additionalContext at session start. Tested: emits valid SessionStart JSON. - post-edit-audit — appends every Edit/Write to .claude/audit.jsonl (gitignored). One-line records {ts, tool, file, ok}. Tested writes. - user-prompt-tag — injects context warnings when prompt mentions force-push, drop-table, "delete all", "push to main", etc. Tested: emits warning for "force push the fix to main". - subagent-stop-judge — off by default; touch .claude/judge-subagents to enable. When on, prompts orchestrator to verify subagent's last message addresses the original task. Cost-free MVP (no LLM call yet). All hooks are Python (jq isn't on the hook PATH on macOS — Python is). Shared helpers in _lib.py: read_input, deny_pretooluse, add_context, warn_to_stderr. ## settings.json — wires all 6 hooks Adds SessionStart, UserPromptSubmit, SubagentStop event handlers. Existing PreToolUse:Bash + PostToolUse:Edit chains gain the new hooks alongside the existing ones (check-inbox.sh, echo reminder). Adds @modelcontextprotocol/server-sequential-thinking MCP server for structured chain-of-thought scratchpad — useful when triaging multiple PRs in parallel without losing context. ## .claude/commands/triage.md — slash command shortcut Manual /triage runs the same flow as the c5074cd5 hourly cron, on demand. Saves ~4KB of prompt every invocation by pulling the cron prompt out of working memory. ## CLAUDE.md additions New "Agent operating rules (auto-loaded — read first)" section right after Ecosystem Context. Documents: - Cron / triage discipline (read learnings, treat docs PRs touching CLAUDE.md/PLAN.md as noteworthy, write per-tick reflections) - Table of all 6 hooks active in this repo - List of skills and how to invoke them - Standing rules (inviolable) consolidated for the agent This block auto-loads into every conversation context — free behavior change without me remembering to opt in. ## .gitignore audit.jsonl, freeze, judge-subagents, per-tick-reflections.md are all local operational state, never committed. ## Verification - echo '{"tool_input":{"command":"git push --force origin main"}}' \| bash pre-bash-careful.sh → emits deny JSON ✓ - Same for git status (safe command) → empty output, exit 0 ✓ - pre-edit-freeze with .claude/freeze=platform/handlers/ blocks edits to PLAN.md, allows edits inside the locked path ✓ - post-edit-audit appends valid JSONL ✓ - session-start-context emits additionalContext with PR/issue counts ✓ - user-prompt-tag emits warning for "force push to main" prompt ✓ - python3 -c "json.load(open('.claude/settings.json'))" → valid ✓ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:00:35 -07:00
Hongming Wang	3f8eb7406f	feat(.claude): ambient hooks + sequential-thinking MCP + /triage command Skills are opt-in (I have to remember to invoke them). Hooks are ambient — they fire on every matching event automatically. This PR moves the careful-mode and learnings discipline from "doc I should read" to "harness-enforced behavior I cannot bypass". ## 6 new hooks (.claude/hooks/) - pre-bash-careful — REFUSES git push --force to main, rm -rf at root, DROP TABLE against prod schema. WARNs on force-with-lease, gh pr/ issue close. Tested: blocks the destructive case, allows safe ones. - pre-edit-freeze — implements /freeze. When .claude/freeze contains a path glob, edits outside it are denied. Tested: edits to PLAN.md blocked when scope locked to platform/internal/handlers/. - session-start-context — auto-loads last 20 cron-learnings, freeze status, open-PR/issue counts as additionalContext at session start. Tested: emits valid SessionStart JSON. - post-edit-audit — appends every Edit/Write to .claude/audit.jsonl (gitignored). One-line records {ts, tool, file, ok}. Tested writes. - user-prompt-tag — injects context warnings when prompt mentions force-push, drop-table, "delete all", "push to main", etc. Tested: emits warning for "force push the fix to main". - subagent-stop-judge — off by default; touch .claude/judge-subagents to enable. When on, prompts orchestrator to verify subagent's last message addresses the original task. Cost-free MVP (no LLM call yet). All hooks are Python (jq isn't on the hook PATH on macOS — Python is). Shared helpers in _lib.py: read_input, deny_pretooluse, add_context, warn_to_stderr. ## settings.json — wires all 6 hooks Adds SessionStart, UserPromptSubmit, SubagentStop event handlers. Existing PreToolUse:Bash + PostToolUse:Edit chains gain the new hooks alongside the existing ones (check-inbox.sh, echo reminder). Adds @modelcontextprotocol/server-sequential-thinking MCP server for structured chain-of-thought scratchpad — useful when triaging multiple PRs in parallel without losing context. ## .claude/commands/triage.md — slash command shortcut Manual /triage runs the same flow as the c5074cd5 hourly cron, on demand. Saves ~4KB of prompt every invocation by pulling the cron prompt out of working memory. ## CLAUDE.md additions New "Agent operating rules (auto-loaded — read first)" section right after Ecosystem Context. Documents: - Cron / triage discipline (read learnings, treat docs PRs touching CLAUDE.md/PLAN.md as noteworthy, write per-tick reflections) - Table of all 6 hooks active in this repo - List of skills and how to invoke them - Standing rules (inviolable) consolidated for the agent This block auto-loads into every conversation context — free behavior change without me remembering to opt in. ## .gitignore audit.jsonl, freeze, judge-subagents, per-tick-reflections.md are all local operational state, never committed. ## Verification - echo '{"tool_input":{"command":"git push --force origin main"}}' \| bash pre-bash-careful.sh → emits deny JSON ✓ - Same for git status (safe command) → empty output, exit 0 ✓ - pre-edit-freeze with .claude/freeze=platform/handlers/ blocks edits to PLAN.md, allows edits inside the locked path ✓ - post-edit-audit appends valid JSONL ✓ - session-start-context emits additionalContext with PR/issue counts ✓ - user-prompt-tag emits warning for "force push to main" prompt ✓ - python3 -c "json.load(open('.claude/settings.json'))" → valid ✓ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:00:35 -07:00
Hongming Wang	239883920e	feat(.claude): 5 gstack-inspired skills + cron upgrades Research on garrytan/gstack surfaced 5 patterns worth importing into our cron / agent setup. These are skills, not platform code — they guide how the cron and our own subagents work, not what the platform does at runtime. ## New skills 1. cross-vendor-review — adversarial second-model review for noteworthy PRs (auth, billing, data deletion, migrations). Catches the 15-30% of bugs single-model review misses. Inspired by gstack's /codex. 2. careful-mode — REFUSE/WARN/ALLOW lists for destructive commands. Refuses force-push to main, blocks merging draft PRs, prevents rm -rf outside scratch dirs. Inspired by gstack's /careful + /freeze. 3. cron-learnings — per-project JSONL of operational learnings appended at the end of every tick, replayed at the start of the next. Stops the cron from re-litigating decided issues. Inspired by gstack's /learn. 4. cron-retro — weekly retrospective auto-posted as a GitHub issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate failure trends, code-review severity over time. Inspired by gstack's /retro. 5. llm-judge — cheap LLM-as-judge eval to catch "agent shipped the wrong thing" — the failure mode unit tests miss. Plug into issue-pickup pipeline so worker-agent draft PRs get scored before being marked ready. Inspired by gstack's tier-3 test infra. ## Cron updates (session-only, c5074cd5 + 060d136c) - Hourly triage cron now opens with careful-mode activation + cron-learnings replay (Step 0) - code-review skill on every PR being considered for merge (Step 2 supplement A — already present, formalized) - cross-vendor-review on noteworthy PRs (Step 2 supplement B — new) - llm-judge on issue-pickup draft PRs before marking ready (Step 4) - Status report now includes cross-vendor pass/fail and llm-judge scores (Step 5) - End-of-tick cron-learnings append (Step 5) - New weekly cron at Sun 23:07 invokes the cron-retro skill ## What we did NOT take from gstack - Their browser fork — not our product - The 23 named roles — we have agent role templates already - Bun toolchain — adds yet another runtime to our stack - /design-shotgun and design-tool variants — we're not a design tool - /document-release — our update-docs skill already covers this See PR description for full research notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 11:36:55 -07:00
Hongming Wang	9d914193d2	feat(.claude): 5 gstack-inspired skills + cron upgrades Research on garrytan/gstack surfaced 5 patterns worth importing into our cron / agent setup. These are skills, not platform code — they guide how the cron and our own subagents work, not what the platform does at runtime. ## New skills 1. cross-vendor-review — adversarial second-model review for noteworthy PRs (auth, billing, data deletion, migrations). Catches the 15-30% of bugs single-model review misses. Inspired by gstack's /codex. 2. careful-mode — REFUSE/WARN/ALLOW lists for destructive commands. Refuses force-push to main, blocks merging draft PRs, prevents rm -rf outside scratch dirs. Inspired by gstack's /careful + /freeze. 3. cron-learnings — per-project JSONL of operational learnings appended at the end of every tick, replayed at the start of the next. Stops the cron from re-litigating decided issues. Inspired by gstack's /learn. 4. cron-retro — weekly retrospective auto-posted as a GitHub issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate failure trends, code-review severity over time. Inspired by gstack's /retro. 5. llm-judge — cheap LLM-as-judge eval to catch "agent shipped the wrong thing" — the failure mode unit tests miss. Plug into issue-pickup pipeline so worker-agent draft PRs get scored before being marked ready. Inspired by gstack's tier-3 test infra. ## Cron updates (session-only, c5074cd5 + 060d136c) - Hourly triage cron now opens with careful-mode activation + cron-learnings replay (Step 0) - code-review skill on every PR being considered for merge (Step 2 supplement A — already present, formalized) - cross-vendor-review on noteworthy PRs (Step 2 supplement B — new) - llm-judge on issue-pickup draft PRs before marking ready (Step 4) - Status report now includes cross-vendor pass/fail and llm-judge scores (Step 5) - End-of-tick cron-learnings append (Step 5) - New weekly cron at Sun 23:07 invokes the cron-retro skill ## What we did NOT take from gstack - Their browser fork — not our product - The 23 named roles — we have agent role templates already - Bun toolchain — adds yet another runtime to our stack - /design-shotgun and design-tool variants — we're not a design tool - /document-release — our update-docs skill already covers this See PR description for full research notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 11:36:55 -07:00
Hongming Wang	34fb3fd471	feat(provisioner): configurable per-tier memory/CPU limits (#14 ) Resolves #14. ApplyTierConfig now reads TIER{2,3,4}_MEMORY_MB and TIER{2,3,4}_CPU_SHARES env vars, falling back to the compiled defaults agreed in the issue: - T2: 512 MiB / 1024 shares (1 CPU) — unchanged baseline - T3: 2048 MiB / 2048 shares (2 CPU) — new cap (previously uncapped) - T4: 4096 MiB / 4096 shares (4 CPU) — new cap (previously uncapped) CPU_SHARES follows Docker's 1024 = 1 CPU convention; internally the value is translated to NanoCPUs for a hard allocation so behaviour remains deterministic across hosts. Malformed or non-positive env values silently fall back to the default. Behaviour change note: T3 and T4 previously had no explicit cap. Operators who relied on unlimited can set very large TIERn_MEMORY_MB / TIERn_CPU_SHARES values; a follow-up can add unset-means-unlimited semantics if required. Tests: - TestGetTierMemoryMB_DefaultsMatchLegacy - TestGetTierMemoryMB_EnvOverride (covers malformed + zero fallback) - TestGetTierCPUShares_EnvOverride - TestApplyTierConfig_T3_UsesEnvOverride (wiring) - TestApplyTierConfig_T3_DefaultCap (documents the new cap) Docs: .env.example section + CLAUDE.md platform env-vars list updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:49:37 -07:00
Hongming Wang	479f1776a8	feat(provisioner): configurable per-tier memory/CPU limits (#14 ) Resolves #14. ApplyTierConfig now reads TIER{2,3,4}_MEMORY_MB and TIER{2,3,4}_CPU_SHARES env vars, falling back to the compiled defaults agreed in the issue: - T2: 512 MiB / 1024 shares (1 CPU) — unchanged baseline - T3: 2048 MiB / 2048 shares (2 CPU) — new cap (previously uncapped) - T4: 4096 MiB / 4096 shares (4 CPU) — new cap (previously uncapped) CPU_SHARES follows Docker's 1024 = 1 CPU convention; internally the value is translated to NanoCPUs for a hard allocation so behaviour remains deterministic across hosts. Malformed or non-positive env values silently fall back to the default. Behaviour change note: T3 and T4 previously had no explicit cap. Operators who relied on unlimited can set very large TIERn_MEMORY_MB / TIERn_CPU_SHARES values; a follow-up can add unset-means-unlimited semantics if required. Tests: - TestGetTierMemoryMB_DefaultsMatchLegacy - TestGetTierMemoryMB_EnvOverride (covers malformed + zero fallback) - TestGetTierCPUShares_EnvOverride - TestApplyTierConfig_T3_UsesEnvOverride (wiring) - TestApplyTierConfig_T3_DefaultCap (documents the new cap) Docs: .env.example section + CLAUDE.md platform env-vars list updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:49:37 -07:00
Hongming Wang	4ff65b82c7	fix(provisioner): preserve Claude session directory across restart (#12 ) Resolves #12. The claude-code SDK stores conversations in /root/.claude/sessions/ and Postgres tracks current_session_id, but the container filesystem was recreated on every restart — next agent message failed with "No conversation found with session ID: <uuid>". Add a per-workspace named Docker volume (ws-<id>-claude-sessions) mounted read-write at /root/.claude/sessions. Gated by runtime=claude-code so other runtimes don't pay for a path they don't use. Volume is cleaned up in RemoveVolume alongside the config volume. Two opt-outs discard the volume before restart for a fresh session: - env WORKSPACE_RESET_SESSION=1 on the container - POST /workspaces/:id/restart?reset=true (or {"reset": true} body) Plumbed via new ResetClaudeSession field on WorkspaceConfig + provisionWorkspaceOpts helper so the flag stays request-scoped (not persisted on CreateWorkspacePayload). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:45:30 -07:00
Hongming Wang	7ad3173c10	fix(provisioner): preserve Claude session directory across restart (#12 ) Resolves #12. The claude-code SDK stores conversations in /root/.claude/sessions/ and Postgres tracks current_session_id, but the container filesystem was recreated on every restart — next agent message failed with "No conversation found with session ID: <uuid>". Add a per-workspace named Docker volume (ws-<id>-claude-sessions) mounted read-write at /root/.claude/sessions. Gated by runtime=claude-code so other runtimes don't pay for a path they don't use. Volume is cleaned up in RemoveVolume alongside the config volume. Two opt-outs discard the volume before restart for a fresh session: - env WORKSPACE_RESET_SESSION=1 on the container - POST /workspaces/:id/restart?reset=true (or {"reset": true} body) Plumbed via new ResetClaudeSession field on WorkspaceConfig + provisionWorkspaceOpts helper so the flag stays request-scoped (not persisted on CreateWorkspacePayload). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:45:30 -07:00
Hongming Wang	a16a25b1f1	docs: sync documentation with 2026-04-14 tick-3 merges (#53 , #54 , #55 ) - docs/edit-history/2026-04-14.md: append tick-3 section covering the admin test-token route (#53), the prior-tick doc-sync PR (#54), and the hermes required_env alignment (#55). Record measured test counts (Go +4 for the TestAdminTestToken_* quartet). - CLAUDE.md: bump Go test count 695 → 699 with a note pointing at the new quartet. Route-table row and env-var mentions for the admin route already landed with #53; verified on main. - .env.example: add MOLECULE_ENABLE_TEST_TOKENS with a comment about the prod-hidden default. Closes the code-review doc-sync flag from #53 (var was in CLAUDE.md but missing from .env.example). No PLAN.md / README.md / README.zh-CN.md update needed — none of the three merges expose a user-visible surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:37:42 -07:00
Hongming Wang	dcf8a07887	docs: sync documentation with 2026-04-14 tick-3 merges (#53 , #54 , #55 ) - docs/edit-history/2026-04-14.md: append tick-3 section covering the admin test-token route (#53), the prior-tick doc-sync PR (#54), and the hermes required_env alignment (#55). Record measured test counts (Go +4 for the TestAdminTestToken_* quartet). - CLAUDE.md: bump Go test count 695 → 699 with a note pointing at the new quartet. Route-table row and env-var mentions for the admin route already landed with #53; verified on main. - .env.example: add MOLECULE_ENABLE_TEST_TOKENS with a comment about the prod-hidden default. Closes the code-review doc-sync flag from #53 (var was in CLAUDE.md but missing from .env.example). No PLAN.md / README.md / README.zh-CN.md update needed — none of the three merges expose a user-visible surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:37:42 -07:00
Hongming Wang	1ab65ca736	Merge pull request #53 from Molecule-AI/feat/issue-6-admin-test-token feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6)	2026-04-14 10:33:59 -07:00
Hongming Wang	639c32045d	Merge pull request #53 from Molecule-AI/feat/issue-6-admin-test-token feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6)	2026-04-14 10:33:59 -07:00
Hongming Wang	585fcfc1ea	Merge pull request #55 from Molecule-AI/fix/hermes-config-env-mismatch fix(hermes): align config.yaml required_env with executor (HERMES_API_KEY)	2026-04-14 10:29:06 -07:00
Hongming Wang	0485585031	Merge pull request #55 from Molecule-AI/fix/hermes-config-env-mismatch fix(hermes): align config.yaml required_env with executor (HERMES_API_KEY)	2026-04-14 10:29:06 -07:00
Hongming Wang	1adfb77be2	Merge pull request #54 from Molecule-AI/docs/sync-2026-04-14-tick-2 docs: sync documentation with 2026-04-14 tick-2 merges (#50, #52)	2026-04-14 10:28:43 -07:00
Hongming Wang	c9f0a915c1	Merge pull request #54 from Molecule-AI/docs/sync-2026-04-14-tick-2 docs: sync documentation with 2026-04-14 tick-2 merges (#50, #52)	2026-04-14 10:28:43 -07:00
Hongming Wang	2e2261ab9c	fix(hermes): align config.yaml required_env with executor (HERMES_API_KEY) The hermes config required NOUS_API_KEY but the executor (workspace-template/adapters/hermes/executor.py from PR #49) checks HERMES_API_KEY and OPENROUTER_API_KEY. A workspace created from this template would have the provisioner block on a missing NOUS_API_KEY even when HERMES_API_KEY was set, or pass provisioning but fail at executor init. .env.example already documents HERMES_API_KEY. Fix: rename the required_env entry to HERMES_API_KEY and update the comments to match the executor's actual fallback order (HERMES_API_KEY first, OPENROUTER_API_KEY second). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:19:55 -07:00
Hongming Wang	fd9e603f29	fix(hermes): align config.yaml required_env with executor (HERMES_API_KEY) The hermes config required NOUS_API_KEY but the executor (workspace-template/adapters/hermes/executor.py from PR #49) checks HERMES_API_KEY and OPENROUTER_API_KEY. A workspace created from this template would have the provisioner block on a missing NOUS_API_KEY even when HERMES_API_KEY was set, or pass provisioning but fail at executor init. .env.example already documents HERMES_API_KEY. Fix: rename the required_env entry to HERMES_API_KEY and update the comments to match the executor's actual fallback order (HERMES_API_KEY first, OPENROUTER_API_KEY second). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:19:55 -07:00
Hongming Wang	8f87acc9df	docs: sync documentation with 2026-04-14 tick-2 merges (#50 , #52 ) Two template-only merges this tick, both editing org-templates/molecule-dev/org.yaml: - #50 PM system prompt — audit summaries are dispatch triggers - #52 UIUX Designer cron installs playwright-chromium (closes #23) No code / env / API / test-count drift. Only docs/edit-history/2026-04-14.md created. CLAUDE.md, PLAN.md, README.md, README.zh-CN.md intentionally untouched. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 09:37:24 -07:00
Hongming Wang	35aa945164	docs: sync documentation with 2026-04-14 tick-2 merges (#50 , #52 ) Two template-only merges this tick, both editing org-templates/molecule-dev/org.yaml: - #50 PM system prompt — audit summaries are dispatch triggers - #52 UIUX Designer cron installs playwright-chromium (closes #23) No code / env / API / test-count drift. Only docs/edit-history/2026-04-14.md created. CLAUDE.md, PLAN.md, README.md, README.zh-CN.md intentionally untouched. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 09:37:24 -07:00
Hongming Wang	496dee8e13	feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6 ) Adds a gated admin endpoint that mints a fresh workspace bearer token on demand, eliminating the register-race currently used by test_comprehensive_e2e.sh (PR #5 follow-up). - New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403). - Mints via wsauth.IssueToken; logs at INFO without the token itself. - Verifies workspace exists before minting (missing -> 404, never 500). - Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace, and happy-path + token-validates round trip. - tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption. - CLAUDE.md updated with route + env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 09:35:26 -07:00
Hongming Wang	0832f997f0	feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6 ) Adds a gated admin endpoint that mints a fresh workspace bearer token on demand, eliminating the register-race currently used by test_comprehensive_e2e.sh (PR #5 follow-up). - New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403). - Mints via wsauth.IssueToken; logs at INFO without the token itself. - Verifies workspace exists before minting (missing -> 404, never 500). - Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace, and happy-path + token-validates round trip. - tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption. - CLAUDE.md updated with route + env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 09:35:26 -07:00
Hongming Wang	b99495b2df	Merge pull request #52 from Molecule-AI/chore/template-uiux-chromium-recipe closes #23	2026-04-14 09:32:16 -07:00
Hongming Wang	347faab6df	Merge pull request #52 from Molecule-AI/chore/template-uiux-chromium-recipe closes #23	2026-04-14 09:32:16 -07:00
Hongming Wang	018eb7f4fd	Merge pull request #50 from Molecule-AI/chore/template-pm-dispatcher chore(template): PM system prompt — treat audit summaries as dispatch triggers, not FYIs	2026-04-14 09:32:08 -07:00
Hongming Wang	14fc30f87d	Merge pull request #50 from Molecule-AI/chore/template-pm-dispatcher chore(template): PM system prompt — treat audit summaries as dispatch triggers, not FYIs	2026-04-14 09:32:08 -07:00
rabbitblood	65b01cef83	chore(template): bake working Chromium recipe into UIUX Designer cron (closes #23 ) UIUX Designer figured out at runtime (Run 6, 2026-04-14) how to get Playwright working without a Dockerfile change: LD_LIBRARY_PATH="/home/agent/.cache/ms-playwright/firefox-1509/firefox" node script.cjs Using @sparticuz/chromium + puppeteer-core, and borrowing the NSS/NSPR libs bundled with Playwright's Firefox binary. This resolves every missing lib on the container without needing apt-get or image rebuild. Agent memory persists the trick across restarts, but a fresh org-template import (new user) would have to rediscover it. Baking the recipe into the cron prompt so every clone inherits day-one screenshot capability. Evidence it works (from Run 6 memory): - 14 screenshots captured and vision-analysed - Found 2 new criticals (C4 onboarding-guide a11y, C5 settings panel white refresh button confirmed in production) that only surface via live DOM - Full user-flow coverage: home → create → settings → help → templates → mobile 375 → responsive 1280 Replaces the previous "best-effort + fall back to HTML" wording with a specific, proven command path. Falls back on HTML only if the browser genuinely won't launch (e.g. host.docker.internal:3000 down). Template-level fix; the general platform-level path would be to ship these libs in the workspace-template image directly (future Dockerfile change — out of scope here).	2026-04-14 09:01:03 -07:00
rabbitblood	40158c3753	chore(template): bake working Chromium recipe into UIUX Designer cron (closes #23 ) UIUX Designer figured out at runtime (Run 6, 2026-04-14) how to get Playwright working without a Dockerfile change: LD_LIBRARY_PATH="/home/agent/.cache/ms-playwright/firefox-1509/firefox" node script.cjs Using @sparticuz/chromium + puppeteer-core, and borrowing the NSS/NSPR libs bundled with Playwright's Firefox binary. This resolves every missing lib on the container without needing apt-get or image rebuild. Agent memory persists the trick across restarts, but a fresh org-template import (new user) would have to rediscover it. Baking the recipe into the cron prompt so every clone inherits day-one screenshot capability. Evidence it works (from Run 6 memory): - 14 screenshots captured and vision-analysed - Found 2 new criticals (C4 onboarding-guide a11y, C5 settings panel white refresh button confirmed in production) that only surface via live DOM - Full user-flow coverage: home → create → settings → help → templates → mobile 375 → responsive 1280 Replaces the previous "best-effort + fall back to HTML" wording with a specific, proven command path. Falls back on HTML only if the browser genuinely won't launch (e.g. host.docker.internal:3000 down). Template-level fix; the general platform-level path would be to ship these libs in the workspace-template image directly (future Dockerfile change — out of scope here).	2026-04-14 09:01:03 -07:00
Hongming Wang	cd4eb9c590	Merge pull request #49 from Molecule-AI/feat/hermes-pr2 feat(hermes): implement create_executor() with HERMES_API_KEY / OPENROUTER_API_KEY fallback + smoke tests	2026-04-14 08:16:15 -07:00
Hongming Wang	a2ea1b183b	Merge pull request #49 from Molecule-AI/feat/hermes-pr2 feat(hermes): implement create_executor() with HERMES_API_KEY / OPENROUTER_API_KEY fallback + smoke tests	2026-04-14 08:16:15 -07:00
rabbitblood	4ac6cdb293	chore(template): PM system prompt — treat audit summaries as dispatch triggers, not FYIs Observed 2026-04-14 morning: audit crons (Security, UIUX, QA) were flowing messages into PM per the PR #26 contract, but PM stopped sub-delegating to Dev Lead ~10 hours ago. Meanwhile audits started opening PRs directly (bypassing Dev Lead), and Dev Lead / BE / FE / DevOps / QA sat idle for 17+ maintenance cycles despite PRs continuing to land. Root cause: PM's system prompt defined delegation behavior for "tasks from CEO" but didn't explicitly treat audit summaries as tasks. PM was reading "audit of SHA X, filed issue #N, top recommendation: fix Y" as a status report and committing it to memory without triggering the dispatch chain. Adds a dedicated "Audit Routing" section to PM's prompt that: - Treats every audit summary with open issue numbers as a dispatch trigger - Specifies routing by category (security→BE, ui→FE, infra→DevOps, qa→QA) - Requires parallel `delegate_task_async` when issues span categories - Makes clean-cycle acks the only no-op case This turns PM from a receptionist into a dispatcher — which was the original intent of the audit-routing contract in #26. Aligns with the north-star goal (keep the team running 24/7): dead idle windows when audits had live issue numbers is a defect in orchestration, not a quiet period.	2026-04-14 08:13:42 -07:00
rabbitblood	3beb09df03	chore(template): PM system prompt — treat audit summaries as dispatch triggers, not FYIs Observed 2026-04-14 morning: audit crons (Security, UIUX, QA) were flowing messages into PM per the PR #26 contract, but PM stopped sub-delegating to Dev Lead ~10 hours ago. Meanwhile audits started opening PRs directly (bypassing Dev Lead), and Dev Lead / BE / FE / DevOps / QA sat idle for 17+ maintenance cycles despite PRs continuing to land. Root cause: PM's system prompt defined delegation behavior for "tasks from CEO" but didn't explicitly treat audit summaries as tasks. PM was reading "audit of SHA X, filed issue #N, top recommendation: fix Y" as a status report and committing it to memory without triggering the dispatch chain. Adds a dedicated "Audit Routing" section to PM's prompt that: - Treats every audit summary with open issue numbers as a dispatch trigger - Specifies routing by category (security→BE, ui→FE, infra→DevOps, qa→QA) - Requires parallel `delegate_task_async` when issues span categories - Makes clean-cycle acks the only no-op case This turns PM from a receptionist into a dispatcher — which was the original intent of the audit-routing contract in #26. Aligns with the north-star goal (keep the team running 24/7): dead idle windows when audits had live issue numbers is a defect in orchestration, not a quiet period.	2026-04-14 08:13:42 -07:00
Hongming Wang	fad03b7db3	Merge pull request #48 from Molecule-AI/fix/issue-17-rogue-restart-loop fix(provisioner): stop rogue config-missing restart loop (#17)	2026-04-14 08:12:30 -07:00
Hongming Wang	cc9f181e8d	Merge pull request #48 from Molecule-AI/fix/issue-17-rogue-restart-loop fix(provisioner): stop rogue config-missing restart loop (#17)	2026-04-14 08:12:30 -07:00
Hongming Wang	9255ba2ada	docs(hermes): document HERMES_API_KEY env var and runtime-table row Adds HERMES_API_KEY to .env.example with a cross-reference to the OPENROUTER_API_KEY fallback, and adds the hermes runtime row to the CLAUDE.md runtime table so the new adapter is discoverable alongside its siblings (langgraph, claude-code, openclaw, crewai, autogen, deepagents). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 08:11:37 -07:00
Hongming Wang	56068a7698	docs(hermes): document HERMES_API_KEY env var and runtime-table row Adds HERMES_API_KEY to .env.example with a cross-reference to the OPENROUTER_API_KEY fallback, and adds the hermes runtime row to the CLAUDE.md runtime table so the new adapter is discoverable alongside its siblings (langgraph, claude-code, openclaw, crewai, autogen, deepagents). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 08:11:37 -07:00
Hongming Wang	41c4d0a2ac	Merge pull request #47 from Molecule-AI/fix/issue-13-workspace-chown fix(workspace): chown /workspace when root-owned bind mount (#13)	2026-04-14 08:10:58 -07:00
Hongming Wang	af54fe89de	Merge pull request #47 from Molecule-AI/fix/issue-13-workspace-chown fix(workspace): chown /workspace when root-owned bind mount (#13)	2026-04-14 08:10:58 -07:00
Hongming Wang	602f3ef685	fix(provisioner): stop rogue config-missing restart loop (#17 ) Resolves #17. Part A: scripts/cleanup-rogue-workspaces.sh deletes workspaces whose id or name starts with known test placeholder prefixes (aaaaaaaa-, etc.) and force-removes the paired Docker container. Documented in tests/README.md. Part B: add a pre-flight check in provisionWorkspace() — when neither a template path nor in-memory configFiles supplies config.yaml, probe the existing named volume via a throwaway alpine container. If the volume lacks config.yaml, mark the workspace status='failed' with a clear last_sample_error instead of handing it to Docker's unless-stopped restart policy (which otherwise loops forever on FileNotFoundError). New pure helper provisioner.ValidateConfigSource + unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 07:32:58 -07:00
Hongming Wang	f7683e3adf	fix(provisioner): stop rogue config-missing restart loop (#17 ) Resolves #17. Part A: scripts/cleanup-rogue-workspaces.sh deletes workspaces whose id or name starts with known test placeholder prefixes (aaaaaaaa-, etc.) and force-removes the paired Docker container. Documented in tests/README.md. Part B: add a pre-flight check in provisionWorkspace() — when neither a template path nor in-memory configFiles supplies config.yaml, probe the existing named volume via a throwaway alpine container. If the volume lacks config.yaml, mark the workspace status='failed' with a clear last_sample_error instead of handing it to Docker's unless-stopped restart policy (which otherwise loops forever on FileNotFoundError). New pure helper provisioner.ValidateConfigSource + unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 07:32:58 -07:00
Hongming Wang	b6c2f15933	fix(workspace): recursive chown when /workspace bind mount is root-owned (#13 ) On Docker Desktop (macOS/Windows), host-path bind mounts often appear root-owned inside the container. The previous entrypoint only chowned /workspace top-level, so agents (uid 1000) still couldn't write to /workspace/repo/* — git clone, pip install, and file edits failed with EACCES and fell back to /tmp. Detect the root-owned-contents case by sampling the first entry; if it's root-owned, recursively chown the tree. On normal Linux Docker with matching uids this is a no-op, so the fast-startup path is preserved for the common case. Part B of the issue (private-repo initial_prompt clone) was addressed by PR #20. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 07:29:30 -07:00
Hongming Wang	cb47e89aa8	fix(workspace): recursive chown when /workspace bind mount is root-owned (#13 ) On Docker Desktop (macOS/Windows), host-path bind mounts often appear root-owned inside the container. The previous entrypoint only chowned /workspace top-level, so agents (uid 1000) still couldn't write to /workspace/repo/* — git clone, pip install, and file edits failed with EACCES and fell back to /tmp. Detect the root-owned-contents case by sampling the first entry; if it's root-owned, recursively chown the tree. On normal Linux Docker with matching uids this is a no-op, so the fast-startup path is preserved for the common case. Part B of the issue (private-repo initial_prompt clone) was addressed by PR #20. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 07:29:30 -07:00
Hongming Wang	707ee08673	Merge pull request #43 from Molecule-AI/fix/reduced-motion fix(a11y): prefers-reduced-motion WCAG 2.3.3 compliance	2026-04-14 07:20:19 -07:00
Hongming Wang	5ab75532d0	Merge pull request #43 from Molecule-AI/fix/reduced-motion fix(a11y): prefers-reduced-motion WCAG 2.3.3 compliance	2026-04-14 07:20:19 -07:00
Hongming Wang	b2b1a88202	Merge pull request #45 from Molecule-AI/feat/zoom-to-team-shortcut feat(canvas): Z shortcut + help entry for double-click zoom-to-team	2026-04-14 07:19:23 -07:00
Hongming Wang	652fc31d9b	Merge pull request #45 from Molecule-AI/feat/zoom-to-team-shortcut feat(canvas): Z shortcut + help entry for double-click zoom-to-team	2026-04-14 07:19:23 -07:00

... 89 90 91 92 93 ...

4721 Commits