Commit Graph

24 Commits

Author SHA1 Message Date
Hongming Wang
855d423f6c review: split push steps, runbook for secret rotation, username clarity
Addresses PR #82 code review: 🟡×3 + 🔵×5.

- Fly registry login username: 'x' → 'molecule-ai' + explanatory comment.
- Build & push split into two steps (GHCR / Fly registry) so a single-
  registry outage can't fail the other. Second step uses 'if: always()'
  to ensure Fly mirror runs even if GHCR push flakes.
- docs/runbooks/saas-secrets.md: full secret map + rotation procedures
  for every SaaS credential, with danger-case callouts. Documents the
  coupled FLY_API_TOKEN (lives in GHA secret AND fly secrets — must be
  rotated in both).
- CLAUDE.md: new 'SaaS ops' section linking to the runbook.
2026-04-14 17:09:11 -07:00
Hongming Wang
292eb71c52 Merge pull request #80 from Molecule-AI/feat/ghcr-platform-image
feat(ci): publish-platform-image → ghcr.io/molecule-ai/platform (Phase B.2)
2026-04-14 16:41:59 -07:00
Hongming Wang
035287df38 feat(ci): publish-platform-image workflow → ghcr.io/molecule-ai/platform
Phase B.2 companion to the private molecule-controlplane provisioner PR.
On every push to main that touches platform/**, builds platform/Dockerfile
and pushes to GHCR with two tags:

- :latest              (floating, always main's tip)
- :sha-<short-commit>  (immutable, pin-friendly)

Cache via GitHub Actions cache (cache-from: type=gha). Workflow_dispatch
trigger so we can re-publish after a docs-only merge if needed.

The private molecule-controlplane sets TENANT_IMAGE=ghcr.io/molecule-ai/platform:<tag>
and the provisioner creates each tenant Fly Machine from this image. Staying
on the same base image across tenants keeps upgrades atomic.

CLAUDE.md updated to document the new workflow in the CI pipeline section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:37:49 -07:00
Hongming Wang
75a1957874 docs: sync documentation with 2026-04-14 tick-8 merge (#78)
- CLAUDE.md: Go test count 740 → 746; MOLECULE_ORG_ID env var documented.
- PLAN.md: new "Recently launched (2026-04-14 tick-8)" block covering
  Phase 32 PR #1 + paired private molecule-controlplane repo scaffolding.
- docs/edit-history/2026-04-14.md: tick-8 breakdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:41:45 -07:00
Hongming Wang
284ef6d33a feat(platform): TenantGuard middleware — public repo's only SaaS hook
Phase 32 foundation. The SaaS control plane (private molecule-controlplane
repo) provisions one platform instance per customer org on Fly Machines
and sets MOLECULE_ORG_ID=<uuid> on the machine. Its subdomain router
forwards requests with X-Molecule-Org-Id=<uuid>.

TenantGuard:
- When MOLECULE_ORG_ID is set → every non-allowlisted request must carry a
  matching X-Molecule-Org-Id header. Mismatched/missing header → 404 (not
  403 — don't leak tenant existence by letting probers distinguish "wrong
  org" from "route doesn't exist").
- When unset → passthrough. Self-hosted / dev / CI behavior unchanged.
- Allowlist is exact-match, not prefix — /health and /metrics only.

No orgs table, no signup, no billing, no Fly provisioning in this repo —
all that lives in the private control plane. The public repo's SaaS
surface is exactly this one middleware.

6 tests covering: unset-is-passthrough, matching header, mismatched
header 404 (with empty body), missing header 404, allowlist bypass, and
allowlist-is-exact-match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:20:33 -07:00
Hongming Wang
cd5498c8dd docs: sync documentation with 2026-04-14 tick-7 merges (#74, #75, #76)
- CLAUDE.md: Go test count 731 → 740; migration count 16 → 23;
  workspace_schedules.source column documented in Database section.
- PLAN.md: new "Recently launched (2026-04-14 tick-7)" section for
  PRs #74/#75/#76 and closed issues #24/#51.
- docs/edit-history/2026-04-14.md: per-PR breakdown of tick-7 merges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:43:16 -07:00
Hongming Wang
3a105fa1cb docs: sync documentation with 2026-04-14 tick-6 merges (#71, #72)
- docs/edit-history/2026-04-14.md: append tick-6 covering PR #71 (plugins UNION) and PR #72 (tick-5 docs-sync)
- CLAUDE.md: Go test count 726 -> 731 (+5 TestPlugins_*); add Plugins section note on UNION + !/- opt-out semantics
- PLAN.md: add "Recently launched (2026-04-14 tick-6)" entry noting issue #68 is resolved by PR #71

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:45:02 -07:00
Hongming Wang
59a96e3888 docs: sync documentation with 2026-04-14 evening-tick merges (#63, #64, #65)
- edit-history/2026-04-14.md: append tick-4 section covering the 12
  modular guardrail plugins (#63), global-secrets auto-restart fan-out
  (#64, fixes issue #15), and synthetic restart-context A2A message
  (#65, fixes issue #19 Layer 1; Layer 2 deferred to issue #66).
- CLAUDE.md: bump Go test count 699 -> 726 (measured); note global
  secrets auto-restart on SetGlobal/DeleteGlobal in the route table;
  add Workspace Lifecycle paragraph for the restart-context message
  and its system:restart-context caller prefix.
- PLAN.md: bump Go test count in the coverage table; record issues
  #15 and #19 Layer 1 as launched; add new Backlog entry for the
  Layer 2 follow-up (issue #66).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:54:04 -07:00
Hongming Wang
d5806440d5 feat(plugins): split guardrails into 12 modular plugins (#63)
Noteworthy: large-addition (+1601 lines, 12 new plugins) + modifies core AgentskillsAdaptor (SDK + runtime copies, drift-guarded). All 7 gates pass, 0 critical findings. Cross-vendor review skipped (tool unavailable).
2026-04-14 12:47:24 -07:00
Hongming Wang
b0d779e4b4 Merge pull request #61 from Molecule-AI/feat/claude-hooks-upgrade
feat(.claude): ambient hooks + sequential-thinking MCP + /triage command
2026-04-14 12:25:54 -07:00
Hongming Wang
119b02c544 feat(plugins): split guardrails into 12 modular plugins
Replaces the proposed monolithic molecule-guardrails plugin with 12
single-purpose plugins users can install à la carte. Powered by a
small extension to the AgentskillsAdaptor base class so any plugin can
ship hooks/, commands/, and a settings-fragment.json without writing a
custom adapter.

## Base adapter changes

workspace-template/plugins_registry/builtins.py + sdk/python/molecule_plugin/builtins.py
(both copies — drift-tested):
- New _install_claude_layer() helper called at the end of install()
- Conditionally copies hooks/ → /configs/.claude/hooks/ (preserving exec bit)
- Conditionally copies commands/*.md → /configs/.claude/commands/
- Conditionally merges settings-fragment.json into /configs/.claude/settings.json
  with ${CLAUDE_DIR} placeholder rewritten to the workspace's absolute install
  path. Existing user hooks are preserved (deep-merge by event name).
- All steps no-op when the plugin doesn't ship the corresponding files,
  so existing skill+rule plugins (molecule-dev, superpowers, ecc,
  browser-automation) are unchanged.

Drift test (tests/test_plugins_builtins_drift.py) still passes.

## 12 new plugins

Hook plugins (ambient enforcement):
- molecule-careful-bash       — refuses destructive bash; ships careful-mode skill
- molecule-freeze-scope       — locks edits via .claude/freeze
- molecule-audit-trail        — appends every Edit/Write to audit.jsonl
- molecule-session-context    — auto-loads cron-learnings at session start
- molecule-prompt-watchdog    — injects warnings on destructive prompt keywords

Skill plugins (on-demand):
- molecule-skill-code-review        — 16-criteria multi-axis review
- molecule-skill-cross-vendor-review — adversarial second-model review
- molecule-skill-llm-judge          — deliverable-vs-request scoring
- molecule-skill-update-docs        — post-merge doc sync
- molecule-skill-cron-learnings     — operational-memory JSONL format

Workflow plugins (slash commands):
- molecule-workflow-triage  — /triage full PR-triage cycle
- molecule-workflow-retro   — /retro + cron-retro skill, weekly retrospective

Each ships only what it needs — most have just plugin.yaml + skills/ or
hooks/ + adapter (one-line stub: `from plugins_registry.builtins import
AgentskillsAdaptor as Adaptor`). Total ~120 files but each plugin is
small and self-contained.

## Verification

- python3 -m molecule_plugin validate plugins/molecule-* → all 13 valid
  (12 new + pre-existing molecule-dev)
- End-to-end install smoke test on representative samples: hook plugin
  (molecule-careful-bash), skill-only plugin (molecule-skill-code-review),
  workflow plugin (molecule-workflow-triage). All produce expected
  /configs/ tree, settings.json paths rewritten, exec bits preserved,
  zero warnings.
- workspace-template pytest tests/test_plugins_builtins_drift.py → passes
  (SDK + runtime stay in sync).

## CLAUDE.md repo-doc updated

Lists all 12 new plugins under the existing Plugins section, organized
by category (hook / skill / workflow). Each entry one line, recommend-
together hints where dependencies make sense.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:20:04 -07:00
Hongming Wang
eea36b9f92 feat(.claude): ambient hooks + sequential-thinking MCP + /triage command
Skills are opt-in (I have to remember to invoke them). Hooks are
ambient — they fire on every matching event automatically. This PR
moves the careful-mode and learnings discipline from "doc I should
read" to "harness-enforced behavior I cannot bypass".

## 6 new hooks (.claude/hooks/)

- pre-bash-careful — REFUSES git push --force to main, rm -rf at root,
  DROP TABLE against prod schema. WARNs on force-with-lease, gh pr/
  issue close. Tested: blocks the destructive case, allows safe ones.
- pre-edit-freeze — implements /freeze. When .claude/freeze contains
  a path glob, edits outside it are denied. Tested: edits to PLAN.md
  blocked when scope locked to platform/internal/handlers/.
- session-start-context — auto-loads last 20 cron-learnings, freeze
  status, open-PR/issue counts as additionalContext at session start.
  Tested: emits valid SessionStart JSON.
- post-edit-audit — appends every Edit/Write to .claude/audit.jsonl
  (gitignored). One-line records {ts, tool, file, ok}. Tested writes.
- user-prompt-tag — injects context warnings when prompt mentions
  force-push, drop-table, "delete all", "push to main", etc. Tested:
  emits warning for "force push the fix to main".
- subagent-stop-judge — off by default; touch .claude/judge-subagents
  to enable. When on, prompts orchestrator to verify subagent's last
  message addresses the original task. Cost-free MVP (no LLM call yet).

All hooks are Python (jq isn't on the hook PATH on macOS — Python is).
Shared helpers in _lib.py: read_input, deny_pretooluse, add_context,
warn_to_stderr.

## settings.json — wires all 6 hooks

Adds SessionStart, UserPromptSubmit, SubagentStop event handlers.
Existing PreToolUse:Bash + PostToolUse:Edit chains gain the new hooks
alongside the existing ones (check-inbox.sh, echo reminder).

Adds @modelcontextprotocol/server-sequential-thinking MCP server for
structured chain-of-thought scratchpad — useful when triaging multiple
PRs in parallel without losing context.

## .claude/commands/triage.md — slash command shortcut

Manual /triage runs the same flow as the c5074cd5 hourly cron, on
demand. Saves ~4KB of prompt every invocation by pulling the cron
prompt out of working memory.

## CLAUDE.md additions

New "Agent operating rules (auto-loaded — read first)" section right
after Ecosystem Context. Documents:
- Cron / triage discipline (read learnings, treat docs PRs touching
  CLAUDE.md/PLAN.md as noteworthy, write per-tick reflections)
- Table of all 6 hooks active in this repo
- List of skills and how to invoke them
- Standing rules (inviolable) consolidated for the agent

This block auto-loads into every conversation context — free behavior
change without me remembering to opt in.

## .gitignore

audit.jsonl, freeze, judge-subagents, per-tick-reflections.md are all
local operational state, never committed.

## Verification

- echo '{"tool_input":{"command":"git push --force origin main"}}' |
  bash pre-bash-careful.sh → emits deny JSON ✓
- Same for git status (safe command) → empty output, exit 0 ✓
- pre-edit-freeze with .claude/freeze=platform/handlers/ blocks
  edits to PLAN.md, allows edits inside the locked path ✓
- post-edit-audit appends valid JSONL ✓
- session-start-context emits additionalContext with PR/issue counts ✓
- user-prompt-tag emits warning for "force push to main" prompt ✓
- python3 -c "json.load(open('.claude/settings.json'))" → valid ✓

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:00:35 -07:00
Hongming Wang
34fb3fd471 feat(provisioner): configurable per-tier memory/CPU limits (#14)
Resolves #14. ApplyTierConfig now reads TIER{2,3,4}_MEMORY_MB and
TIER{2,3,4}_CPU_SHARES env vars, falling back to the compiled defaults
agreed in the issue:

  - T2: 512 MiB  / 1024 shares (1 CPU)  — unchanged baseline
  - T3: 2048 MiB / 2048 shares (2 CPU)  — new cap (previously uncapped)
  - T4: 4096 MiB / 4096 shares (4 CPU)  — new cap (previously uncapped)

CPU_SHARES follows Docker's 1024 = 1 CPU convention; internally the
value is translated to NanoCPUs for a hard allocation so behaviour
remains deterministic across hosts. Malformed or non-positive env
values silently fall back to the default.

Behaviour change note: T3 and T4 previously had no explicit cap.
Operators who relied on unlimited can set very large TIERn_MEMORY_MB /
TIERn_CPU_SHARES values; a follow-up can add unset-means-unlimited
semantics if required.

Tests:
  - TestGetTierMemoryMB_DefaultsMatchLegacy
  - TestGetTierMemoryMB_EnvOverride (covers malformed + zero fallback)
  - TestGetTierCPUShares_EnvOverride
  - TestApplyTierConfig_T3_UsesEnvOverride (wiring)
  - TestApplyTierConfig_T3_DefaultCap (documents the new cap)

Docs: .env.example section + CLAUDE.md platform env-vars list updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:49:37 -07:00
Hongming Wang
a16a25b1f1 docs: sync documentation with 2026-04-14 tick-3 merges (#53, #54, #55)
- docs/edit-history/2026-04-14.md: append tick-3 section covering the
  admin test-token route (#53), the prior-tick doc-sync PR (#54), and
  the hermes required_env alignment (#55). Record measured test counts
  (Go +4 for the TestAdminTestToken_* quartet).
- CLAUDE.md: bump Go test count 695 → 699 with a note pointing at the
  new quartet. Route-table row and env-var mentions for the admin
  route already landed with #53; verified on main.
- .env.example: add MOLECULE_ENABLE_TEST_TOKENS with a comment about
  the prod-hidden default. Closes the code-review doc-sync flag from
  #53 (var was in CLAUDE.md but missing from .env.example).

No PLAN.md / README.md / README.zh-CN.md update needed — none of the
three merges expose a user-visible surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:37:42 -07:00
Hongming Wang
496dee8e13 feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6)
Adds a gated admin endpoint that mints a fresh workspace bearer token on
demand, eliminating the register-race currently used by
test_comprehensive_e2e.sh (PR #5 follow-up).

- New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production
  or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403).
- Mints via wsauth.IssueToken; logs at INFO without the token itself.
- Verifies workspace exists before minting (missing -> 404, never 500).
- Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace,
  and happy-path + token-validates round trip.
- tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption.
- CLAUDE.md updated with route + env vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 09:35:26 -07:00
Hongming Wang
cd4eb9c590 Merge pull request #49 from Molecule-AI/feat/hermes-pr2
feat(hermes): implement create_executor() with HERMES_API_KEY / OPENROUTER_API_KEY fallback + smoke tests
2026-04-14 08:16:15 -07:00
Hongming Wang
9255ba2ada docs(hermes): document HERMES_API_KEY env var and runtime-table row
Adds HERMES_API_KEY to .env.example with a cross-reference to the
OPENROUTER_API_KEY fallback, and adds the hermes runtime row to the
CLAUDE.md runtime table so the new adapter is discoverable alongside
its siblings (langgraph, claude-code, openclaw, crewai, autogen,
deepagents).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 08:11:37 -07:00
Hongming Wang
77081a315b Merge pull request #16 from Molecule-AI/fix/infra-compose-external-network
fix(infra): attach docker-compose.infra.yml services to molecule-monorepo-net + add Temporal
2026-04-13 22:19:36 -07:00
Hongming Wang
708eb73fd8 docs(gate-5): document Temporal dependency in CLAUDE.md/PLAN.md 2026-04-13 21:38:25 -07:00
Hongming Wang
f6efd64839 fix: gate-5 document browser-automation plugin in CLAUDE.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 21:37:29 -07:00
Hongming Wang
659c4146c8 docs: correct stale test counts in PR #9
Subagent used old CLAUDE.md baselines instead of measuring actuals.
Verified counts via pytest --collect-only and go test -v:

- Go platform: 536 → 695 (+159 off)
- Python workspace-template: 1084 → 1140 (+56 off)
- SDK python: 121 → 132 (+11 off)
- Canvas vitest: 357 (already correct)
- MCP jest: 97 (already correct)

Files updated:
- CLAUDE.md (Unit Tests block)
- PLAN.md (Test Coverage table + totals: 2,295 → 2,421)
- docs/development/local-development.md
- docs/edit-history/2026-04-13.md (session test-count table +
  explanatory note about why the Python and SDK counts didn't
  change today)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:51:12 -07:00
Hongming Wang
eca9796a5b docs: sync documentation with 2026-04-13 merges (PRs #1-#8)
Covers today's quality + infra pass: brand/structural cleanup, MCP
per-domain refactor (1697 -> 89 lines, 87 tools), canvas ConfirmDialog
unification, 4 platform handler decompositions (+47 Go tests), E2E
hardening for Phase 30.1/30.6 auth, and two new CI jobs (e2e-api +
shellcheck).

- CLAUDE.md: updated test counts (Go 536, canvas 357, SDK 121, MCP 97,
  workspace 1084); documented MCP per-domain split + new api.ts; added
  handler-decomposition section; Phase 30.1/30.6 auth callout; new
  CI jobs; env vars cross-ref.
- PLAN.md: Phase 31 "Quality + Infra Pass" marked shipped; test totals
  refreshed to 2,295.
- README.zh-CN.md: license badge MIT -> BSL 1.1; added BSL license block.
- docs/api-protocol/platform-api.md: registry table gains Auth column
  documenting Phase 30.1 bearer-token and Phase 30.6 X-Workspace-ID
  requirements on heartbeat/update-card/discover/peers.
- docs/development/local-development.md: updated stale test counts;
  added e2e-api + shellcheck CI jobs; pointer to new testing-e2e.md.
- docs/development/testing-e2e.md: new — per-script reference, auth
  prerequisites, local run, CI coverage, adding-a-new-check checklist.
- docs/edit-history/2026-04-13.md: top-of-file summary section added
  spanning PRs #1-#8; preserves existing per-feature entries below.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:46:28 -07:00
Hongming Wang
50b0a1859a refactor(mcp-server): DRY envelopes, typed apiCall, explicit re-exports
Second-pass cleanup after the monolith split. Addresses every issue
from the code-review pass.

Core additions in src/api.ts:
- toMcpResult(data) + toMcpText(text): single source of truth for the
  MCP text-content envelope (was ~87 duplicated literals)
- ApiError type + isApiError(v) guard: typed discriminated-union for
  the error-by-value pattern; replaces open-coded shape checks
- apiCall<T = unknown>: generic so callers can document expected
  response shape without unchecked "as" casts

Bulk cleanups across all 12 tools/*.ts:
- Every handler now returns toMcpResult(data) or toMcpText(text)
- Open-coded "typeof obj === 'object' && 'error' in obj" in
  remote_agents.ts replaced with isApiError(v)
- Extracted initialCanvasPosition() helper out of
  handleCreateWorkspace; explains why random seeding exists
- Added runtime/workspace_dir/workspace_access to create_workspace
  zod schema (previously accepted by handler but hidden from clients)

src/index.ts:
- Replaced "export * from" with explicit named re-exports so the
  public surface is auditable and future name collisions fail loudly

Tests:
- createServer() smoke test that records every srv.tool(...) call and
  asserts 87 registered tools unique by name. Catches future PRs that
  forget to wire a registerXxxTools(srv).

Docs:
- Fix broken relative links in sdk/python/molecule_agent/README.md
  (was ../../examples/ from inside sdk/python/, should be ../examples/)
- Update stale "61 tools" -> "87 tools" in CLAUDE.md + main() log

Verification:
- npm run build clean
- npx jest -> 97/97 passed (was 96; +1 smoke test)
- grep "content: [{ type: \"text\" as const" src/tools/ -> 0 matches
- No file over 216 lines

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:26:17 -07:00
Hongming Wang
24fec62d7f initial commit — Molecule AI platform
Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1)
with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo.

Brand: Starfire → Molecule AI.
Slug: starfire / agent-molecule → molecule.
Env vars: STARFIRE_* → MOLECULE_*.
Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform.
Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent.
DB: agentmolecule → molecule.

History truncated; see public repo for prior commits and contributor
attribution. Verified green: go test -race ./... (platform), pytest
(workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:55:37 -07:00