Commit Graph

155 Commits

Author SHA1 Message Date
Hongming Wang
31fca5ea6e
Merge pull request #82 from Molecule-AI/feat/mirror-to-fly-registry
feat(ci): mirror platform image to registry.fly.io/molecule-tenant
2026-04-14 17:16:04 -07:00
Hongming Wang
73dbca4e38 review: split push steps, runbook for secret rotation, username clarity
Addresses PR #82 code review: 🟡×3 + 🔵×5.

- Fly registry login username: 'x' → 'molecule-ai' + explanatory comment.
- Build & push split into two steps (GHCR / Fly registry) so a single-
  registry outage can't fail the other. Second step uses 'if: always()'
  to ensure Fly mirror runs even if GHCR push flakes.
- docs/runbooks/saas-secrets.md: full secret map + rotation procedures
  for every SaaS credential, with danger-case callouts. Documents the
  coupled FLY_API_TOKEN (lives in GHA secret AND fly secrets — must be
  rotated in both).
- CLAUDE.md: new 'SaaS ops' section linking to the runbook.
2026-04-14 17:09:11 -07:00
Hongming Wang
6bcafd643e feat(ci): mirror platform image to registry.fly.io/molecule-tenant
Keeps ghcr.io/molecule-ai/platform private (per CEO direction — open-
source when full SaaS ships) while still letting the private control
plane's Fly provisioner boot tenant machines: Fly auto-authenticates
same-org machines against registry.fly.io, no per-tenant pull
credentials to wire.

Workflow now logs into both GHCR (using built-in GITHUB_TOKEN) and
Fly registry (using FLY_API_TOKEN secret) and pushes the same image to
four tags total:
- ghcr.io/molecule-ai/platform:latest
- ghcr.io/molecule-ai/platform:sha-<short>
- registry.fly.io/molecule-tenant:latest
- registry.fly.io/molecule-tenant:sha-<short>

Secret added via `gh secret set FLY_API_TOKEN` on the public repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:05:36 -07:00
Hongming Wang
c3cc8e8725
Merge pull request #80 from Molecule-AI/feat/ghcr-platform-image
feat(ci): publish-platform-image → ghcr.io/molecule-ai/platform (Phase B.2)
2026-04-14 16:41:59 -07:00
Hongming Wang
d53a128774
Merge pull request #79 from Molecule-AI/docs/sync-2026-04-14-tick-8
docs: sync documentation with 2026-04-14 tick-8 merge (#78)
2026-04-14 16:40:27 -07:00
Hongming Wang
92a06a8684 feat(ci): publish-platform-image workflow → ghcr.io/molecule-ai/platform
Phase B.2 companion to the private molecule-controlplane provisioner PR.
On every push to main that touches platform/**, builds platform/Dockerfile
and pushes to GHCR with two tags:

- :latest              (floating, always main's tip)
- :sha-<short-commit>  (immutable, pin-friendly)

Cache via GitHub Actions cache (cache-from: type=gha). Workflow_dispatch
trigger so we can re-publish after a docs-only merge if needed.

The private molecule-controlplane sets TENANT_IMAGE=ghcr.io/molecule-ai/platform:<tag>
and the provisioner creates each tenant Fly Machine from this image. Staying
on the same base image across tenants keeps upgrades atomic.

CLAUDE.md updated to document the new workflow in the CI pipeline section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:37:49 -07:00
Hongming Wang
19fd82e2c3 chore: hardcode moleculesai.app as production domain
Domain confirmed: MOLECULESAI.APP. Updates the Phase 32 success-criteria line in PLAN.md to point at the real domain.
2026-04-14 16:03:35 -07:00
Hongming Wang
574d6d9b0a docs: sync documentation with 2026-04-14 tick-8 merge (#78)
- CLAUDE.md: Go test count 740 → 746; MOLECULE_ORG_ID env var documented.
- PLAN.md: new "Recently launched (2026-04-14 tick-8)" block covering
  Phase 32 PR #1 + paired private molecule-controlplane repo scaffolding.
- docs/edit-history/2026-04-14.md: tick-8 breakdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:41:45 -07:00
Hongming Wang
57a05686a4
Merge pull request #78 from Molecule-AI/feat/saas-tenant-guard-middleware
feat(platform): TenantGuard middleware — public repo's only SaaS hook (Phase 32 PR #1)
2026-04-14 15:40:35 -07:00
Hongming Wang
2094f4f0c2 feat(platform): TenantGuard middleware — public repo's only SaaS hook
Phase 32 foundation. The SaaS control plane (private molecule-controlplane
repo) provisions one platform instance per customer org on Fly Machines
and sets MOLECULE_ORG_ID=<uuid> on the machine. Its subdomain router
forwards requests with X-Molecule-Org-Id=<uuid>.

TenantGuard:
- When MOLECULE_ORG_ID is set → every non-allowlisted request must carry a
  matching X-Molecule-Org-Id header. Mismatched/missing header → 404 (not
  403 — don't leak tenant existence by letting probers distinguish "wrong
  org" from "route doesn't exist").
- When unset → passthrough. Self-hosted / dev / CI behavior unchanged.
- Allowlist is exact-match, not prefix — /health and /metrics only.

No orgs table, no signup, no billing, no Fly provisioning in this repo —
all that lives in the private control plane. The public repo's SaaS
surface is exactly this one middleware.

6 tests covering: unset-is-passthrough, matching header, mismatched
header 404 (with empty body), missing header 404, allowlist bypass, and
allowlist-is-exact-match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:20:33 -07:00
Hongming Wang
a04207aba6
Merge pull request #77 from Molecule-AI/docs/sync-2026-04-14-tick-7
docs: sync documentation with 2026-04-14 tick-7 merges (#74, #75, #76)
2026-04-14 14:59:08 -07:00
Hongming Wang
1dabb35e17 docs: sync documentation with 2026-04-14 tick-7 merges (#74, #75, #76)
- CLAUDE.md: Go test count 731 → 740; migration count 16 → 23;
  workspace_schedules.source column documented in Database section.
- PLAN.md: new "Recently launched (2026-04-14 tick-7)" section for
  PRs #74/#75/#76 and closed issues #24/#51.
- docs/edit-history/2026-04-14.md: per-PR breakdown of tick-7 merges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:43:16 -07:00
Hongming Wang
07a5ca3c51
Merge pull request #76 from Molecule-AI/fix/issue-24-schedules-db-authoritative
fix(org): DB-authoritative schedules; org/import is additive on template rows (#24)
2026-04-14 14:40:54 -07:00
Hongming Wang
dee5322d22
Merge pull request #75 from Molecule-AI/feat/issue-51-category-routing
feat(platform): generic category_routing replaces hardcoded audit dispatch (#51)
2026-04-14 14:40:51 -07:00
Hongming Wang
20068196bb
Merge pull request #74 from Molecule-AI/chore/template-plugin-union-cleanup
chore(template): simplify per-role plugin lists using #71 union semantics
2026-04-14 14:40:48 -07:00
Hongming Wang
911580c625
Merge pull request #73 from Molecule-AI/docs/sync-2026-04-14-tick-6
docs: sync documentation with 2026-04-14 tick-6 merges (#71, #72)
2026-04-14 14:40:44 -07:00
Hongming Wang
a921644f9c fix(schedules): backfill legacy rows to 'template' + extract import SQL const
Addresses code-review warnings on PR #76:
- Migration 022 now backfills pre-existing workspace_schedules rows to
  source='template' before flipping NOT NULL + DEFAULT 'runtime'. Legacy
  rows (all seeded via org/import historically) stay refreshable on
  re-import. Down migration drops the CHECK constraint too.
- Extracted the import UPSERT into const orgImportScheduleSQL so the shape
  test asserts against the const directly instead of file-scraping org.go.
  Removed the os.ReadFile helper.
- scheduleResponse.Source gets json:\",omitempty\" so old clients that
  predate the migration don't see an empty string they can't explain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:30:22 -07:00
Hongming Wang
608d6745b6 fix(org): use yaml.Marshal for category_routing + newline-guard block appends
Addresses code-review warnings on PR #75:
- renderCategoryRoutingYAML now builds yaml.Node + yaml.Marshal, escaping
  YAML-reserved chars in role names correctly (was JSON-as-YAML, fragile on
  unicode line separators).
- New appendYAMLBlock helper guarantees a newline boundary when concatenating
  YAML fragments into config.yaml (category_routing + initial_prompt both
  used to risk merging into the previous line).
- Fixed struct comment (replace-per-key, not UNION).
- Added TestCategoryRouting_EscapesYAMLSpecials and TestAppendYAMLBlock_NewlineGuard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:28:22 -07:00
Hongming Wang
293033de23 fix(org): DB-authoritative schedules; org/import is additive on template rows (#24)
Resolves #24 per CEO direction.

DB is source of truth for workspace_schedules. POST /org/import becomes
idempotent — only touches rows it owns (source='template'); runtime-added
schedules (Canvas / API) are preserved across re-imports.

- Migration 022: adds source TEXT NOT NULL DEFAULT 'runtime' CHECK in
  ('template','runtime'); unique index on (workspace_id, name) so the
  org/import upsert can use ON CONFLICT.
- org.go: schedule INSERT becomes
    INSERT ... 'template' ON CONFLICT (workspace_id, name) DO UPDATE
      SET ... WHERE workspace_schedules.source='template'.
  Never DELETEs.
- schedules.go: runtime POST writes 'runtime' explicitly; List handler
  surfaces the source field on the response so Canvas can render badges.
- 3 new unit tests assert source='runtime' default for runtime CRUD,
  the SQL shape contract for org/import (additive + idempotent +
  runtime-preserving + never-DELETE), and List response surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:09:44 -07:00
Hongming Wang
932ada2c59 feat(platform): generic category_routing replaces hardcoded audit dispatch (#51)
Add a category_routing block to org.yaml schema (defaults + per-workspace,
UNION semantics with per-key replace). The merged routing table is rendered
into each workspace's config.yaml at import time.

PM's system prompt loses the hardcoded security/ui/infra → role mapping
from PR #50; instead it reads category_routing from /configs/config.yaml
and delegates to whatever roles the org template lists for the incoming
audit-summary's category. Future org templates ship their own routing
without prompt churn.

Tests: 4 new TestCategoryRouting_* cases covering YAML parse, UNION+drop
semantics, deterministic config.yaml render, and empty-map handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:06:47 -07:00
rabbitblood
ae0ff29a5c chore(template): simplify per-role plugin lists using #71 union semantics
#71 just merged — per-workspace `plugins:` now UNIONs with `defaults.plugins`
instead of replacing it. Simplifies every override in molecule-dev/ from
"defaults+1 = list 10 items" to "defaults+1 = list 1 item":

  PM:               11 items → 2  (workflow-triage + workflow-retro)
  Research Lead:    10 items → 1  (browser-automation)
  Market Analyst:   10 items → 1
  Technical Researcher: 10 items → 1
  Competitive Intel: 10 items → 1
  Security Auditor: 12 items → 3  (code-review + cross-vendor-review + llm-judge)
  UIUX Designer:    10 items → 1  (browser-automation)

Every workspace still receives the full 9-plugin default set (ecc,
molecule-dev, superpowers, careful-bash, prompt-watchdog, audit-trail,
session-context, cron-learnings, update-docs) — verified by reading
mergePlugins() in platform/internal/handlers/org.go:645.

Also drops the stale "REPLACE not UNION" warning comments and points
defaults' header comment at the new union behaviour.

Net diff: ~30 lines removed, ~10 added. Template is now meaningfully
easier to extend — each new defaults.plugin propagates everywhere
without sweeping per-role lists.

Closes follow-up scope from PR #70.
2026-04-14 14:05:43 -07:00
Hongming Wang
7584904a7b docs: sync documentation with 2026-04-14 tick-6 merges (#71, #72)
- docs/edit-history/2026-04-14.md: append tick-6 covering PR #71 (plugins UNION) and PR #72 (tick-5 docs-sync)
- CLAUDE.md: Go test count 726 -> 731 (+5 TestPlugins_*); add Plugins section note on UNION + !/- opt-out semantics
- PLAN.md: add "Recently launched (2026-04-14 tick-6)" entry noting issue #68 is resolved by PR #71

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:45:02 -07:00
Hongming Wang
26622dc8ab
Merge pull request #71 from Molecule-AI/fix/issue-68-plugins-union
Merged after 7-gate verification.

Gates: 1 (CI 6/6 + 1 skip) pass, 2 (build/vet) pass, 3 (5 new TestPlugins_* + backward-compat) pass, 4 (security) pass, 5 (design) pass with 1 yellow, 6 (line review) pass, 7 N/A.

Backward-compat verified: molecule-dev/org.yaml re-lists [ecc, molecule-dev, superpowers, browser-automation] in each role; under new UNION+dedupe the merged set is identical to the prior REPLACE result. PR #70's 1 yellow (REPLACE verbosity / re-listing chore) is now closed by this change — orgs can drop the re-listing once confident.

Cross-vendor-review: second-model tooling unavailable in this worktree; Claude-only review applied per standing rule fallback.

Yellow (non-blocking, follow-up): opt-out semantics (`!plugin` / `-plugin`) are documented only in the code comment. Safety plugins like `molecule-careful-bash` can be disabled by an org.yaml using `!molecule-careful-bash` — this is operator-controlled config per I-2 and therefore acceptable, but docs/plugins/ should get an "overriding defaults" page in a follow-up.

noteworthy: plugin-semantics-change
2026-04-14 13:42:30 -07:00
Hongming Wang
3cc4e236a3
Merge pull request #72 from Molecule-AI/docs/sync-2026-04-14-tick-5
docs: sync documentation with 2026-04-14 tick-5 merges (#69, #70)
2026-04-14 13:41:45 -07:00
Hongming Wang
39bd59ba79 docs: sync documentation with 2026-04-14 tick-5 merges (#69, #70)
- docs/edit-history/2026-04-14.md — append tick-5 section covering PR #69
  (PLAN.md backlog stale-ref cleanup) and PR #70 (wire 12 modular plugins
  from PR #63 into the default molecule-dev org template; defaults 3 → 9
  plus PM + Security Auditor role extras).
- PLAN.md — add tick-5 entries under "Recently launched" noting PR #70
  activated the tick-4 plugins and PR #69 cleaned up stale backlog refs.

Both merges are docs/template-only. No code surface moved, no new env
vars, no test-count drift. CLAUDE.md, .env.example, README.md, and
README.zh-CN.md unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:21:30 -07:00
Hongming Wang
d9603a77ce fix(org): per-workspace plugins UNION with defaults; '!' prefix opts out (#68)
Per-workspace `plugins:` now UNIONS with `defaults.plugins` instead of
replacing. A leading `!` or `-` on a per-workspace entry opts a default
out. Backward-compatible: re-listing defaults still dedupes to the same
list.

Refactored the inline REPLACE logic into a pure helper `mergePlugins`
in org.go so it's unit-testable. Five TestPlugins_* cases added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:21:23 -07:00
Hongming Wang
e6d8cdfc87
Merge pull request #70 from Molecule-AI/chore/template-plugin-enrichment
chore(template): wire 9 new guardrail/skill plugins into defaults; PM + Security Auditor get role extras
2026-04-14 13:18:46 -07:00
Hongming Wang
2c89e24298
Merge pull request #69 from Molecule-AI/docs/cleanup-stale-backlog-refs
docs(plan): drop stale sequential refs from Backlog items 11-14
2026-04-14 13:18:30 -07:00
rabbitblood
def76e788f chore(template): wire 9 new guardrail/skill plugins into defaults; PM + Security Auditor get role extras
PR #63 just merged 12 new modular plugins (split from a single guardrails
bundle) and the audit pipeline (Security/UIUX/QA crons) is now producing
PRs continuously. Time to wire the new plugins into the molecule-dev
template so every workspace + every cron tick benefits.

## Defaults — universal additions (was 3, now 9)

- molecule-careful-bash         — refuse rm -rf, push --force main, DROP TABLE
- molecule-prompt-watchdog      — warn on destructive user prompts
- molecule-audit-trail          — append every Edit/Write to .claude/audit.jsonl
- molecule-session-context      — auto-load cron learnings + PR/issue counts on SessionStart
- molecule-skill-cron-learnings — per-tick learning JSONL format (pairs with session-context)
- molecule-skill-update-docs    — keep architecture/README/edit-history aligned

Kept: ecc, molecule-dev, superpowers.

## Per-role overrides

- PM: defaults + molecule-workflow-triage + molecule-workflow-retro
  (the /triage and /retro slash commands match PM's coordination role)

- Security Auditor: defaults + molecule-skill-code-review +
  molecule-skill-cross-vendor-review + molecule-skill-llm-judge
  (security PRs benefit from multi-criteria review, adversarial cross-vendor
  second opinion, and an LLM-judge gate that catches "agent shipped the
  wrong thing")

- Research Lead + 3 researchers + UIUX Designer: defaults + browser-automation
  (existing override; just synced to the new default set)

Other 5 dev roles (Dev Lead, BE, FE, DevOps, QA) inherit defaults — the
new universal set is rich enough for them; code-review skill is a runtime
opt-in if Dev Lead decides per-PR.

## REPLACE-semantics verbosity

`platform/internal/handlers/org.go:~345` treats per-workspace plugins as
REPLACE not UNION. Every override has to re-list the 9 defaults to add 1
extra. Tracked as #68 with a union-proposal; once that lands the per-role
lists shrink to just the additions.

## Test plan

- [x] YAML valid (`python -c "import yaml; yaml.safe_load(...)"`)
- [x] defaults.plugins count = 9
- [ ] After merge + re-import: every workspace's /configs/plugins/ contains
      the full set; PM has /triage and /retro commands; Security Auditor
      can invoke cross-vendor-review on its findings.
2026-04-14 13:07:05 -07:00
Hongming Wang
730bcc4e9f docs(plan): drop stale sequential refs #64-#67 from Backlog items 11-14
Backlog items 11-14 used sequential enumeration (#64/#65/#66/#67) as
intra-doc bookkeeping. Those numbers now collide with actual merged
PRs and open issues with completely different scopes:
  - PR #64 = auto-refresh global_secrets (not "delegations list")
  - PR #65 = restart context Layer 1 (not "per-agent repo access")
  - Issue #66 = restart_prompt Layer 2 (not "SDK swallows stderr")
  - PR #67 = docs sync tick-4 (not "MCP localhost default")

Strip the misleading refs and add a footnote explaining the cleanup.
If/when any of these items get prioritized, file real GitHub issues.

Tracked in cron-learnings tick-3 entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:05:08 -07:00
Hongming Wang
b9b96c9cff
Merge pull request #67 from Molecule-AI/docs/sync-2026-04-14-tick-4
docs: sync documentation with 2026-04-14 evening-tick merges (#63, #64, #65)
2026-04-14 13:03:18 -07:00
Hongming Wang
2fa6f7c6cd docs: sync documentation with 2026-04-14 evening-tick merges (#63, #64, #65)
- edit-history/2026-04-14.md: append tick-4 section covering the 12
  modular guardrail plugins (#63), global-secrets auto-restart fan-out
  (#64, fixes issue #15), and synthetic restart-context A2A message
  (#65, fixes issue #19 Layer 1; Layer 2 deferred to issue #66).
- CLAUDE.md: bump Go test count 699 -> 726 (measured); note global
  secrets auto-restart on SetGlobal/DeleteGlobal in the route table;
  add Workspace Lifecycle paragraph for the restart-context message
  and its system:restart-context caller prefix.
- PLAN.md: bump Go test count in the coverage table; record issues
  #15 and #19 Layer 1 as launched; add new Backlog entry for the
  Layer 2 follow-up (issue #66).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:54:04 -07:00
Hongming Wang
383582fbbf
Merge pull request #64 from Molecule-AI/fix/issue-15-refresh-oauth-on-restart
fix(secrets): auto-refresh global_secrets on workspace restart (#15)
2026-04-14 12:49:19 -07:00
Hongming Wang
3ea8cda5b0
Merge pull request #65 from Molecule-AI/fix/issue-19-restart-context-layer1
feat(platform): inject restart context system message (#19 Layer 1)
2026-04-14 12:48:19 -07:00
Hongming Wang
8b896b1a56
feat(plugins): split guardrails into 12 modular plugins (#63)
Noteworthy: large-addition (+1601 lines, 12 new plugins) + modifies core AgentskillsAdaptor (SDK + runtime copies, drift-guarded). All 7 gates pass, 0 critical findings. Cross-vendor review skipped (tool unavailable).
2026-04-14 12:47:24 -07:00
Hongming Wang
c4240e32c1 feat(platform): inject restart context system message (#19 Layer 1)
After a workspace restart (HTTP /restart or programmatic RestartByID) and
re-registration, the platform sends a synthetic A2A message/send to the
workspace containing:
- restart timestamp
- previous session end timestamp + human duration
- env-var keys now available (keys only — never values)

The message is rendered in the format proposed in #19 and marked with
metadata.kind=restart_context so agents can detect and handle it
specifically if they choose.

Skip path: if the workspace doesn't re-register within 30s, log and drop.
The Restart HTTP response is unaffected by delivery success.

Layer 2 (user-defined restart_prompt via config.yaml / org.yaml) is
deferred — tracked as a separate follow-up issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:41:01 -07:00
Hongming Wang
e658f86c08 fix(secrets): auto-restart workspaces on global secret change (#15)
Global secrets (e.g. CLAUDE_CODE_OAUTH_TOKEN) are injected as container env
vars at Start() time. Until now, rotating one only propagated to a workspace
on the next full restart-from-zero, which manual ops had to drive via a
`POST /workspaces/:id/restart` loop. Tier-3 Claude Code agents hit the
stale-token path first and surfaced as 401s inside the SDK.

Restart-time re-read of global_secrets + workspace_secrets was already
correct in `provisionWorkspaceOpts` — the missing piece was the trigger.
SetGlobal / DeleteGlobal now enqueue RestartByID for every non-paused,
non-removed, non-external workspace that does NOT shadow the key with a
workspace-level override. Matches the existing behaviour of workspace-scoped
`Set` / `Delete`.

Adds two sqlmock-backed tests exercising both branches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:39:00 -07:00
Hongming Wang
d0eaa814de fix(gate-4): add missing import json in sdk/python/molecule_plugin/builtins.py
PR #63 code-review caught that the SDK copy of AgentskillsAdaptor uses
json.loads/json.dumps in _merge_settings_fragment + _rewrite_hook_paths
+ _deep_merge_hooks but never imports json. The runtime copy
(workspace-template/plugins_registry/builtins.py) already has the
import; this brings the SDK side in line.

Bug surfaces only when a plugin shipping settings-fragment.json (any
of the 5 hook plugins or 2 workflow plugins in this PR) is installed
through the SDK path — would NameError on the first json.loads call.
The drift test catches behavioral drift via fixture install scenarios
but not import-level drift in helper code paths.

Verified: json is now importable (`hasattr(molecule_plugin.builtins,
'json')` → True), drift test still passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:29:32 -07:00
Hongming Wang
9c7f57688c
Merge pull request #57 from Molecule-AI/fix/issue-12-preserve-claude-sessions
fix(provisioner): preserve Claude session directory across restart (#12)
2026-04-14 12:26:12 -07:00
Hongming Wang
d0c5626df1
Merge pull request #61 from Molecule-AI/feat/claude-hooks-upgrade
feat(.claude): ambient hooks + sequential-thinking MCP + /triage command
2026-04-14 12:25:54 -07:00
Hongming Wang
bab8110d34
Merge pull request #60 from Molecule-AI/feat/gstack-inspired-cron-upgrades
feat(.claude): 5 gstack-inspired skills + cron upgrades
2026-04-14 12:25:19 -07:00
Hongming Wang
18a5d1a538
Merge pull request #58 from Molecule-AI/feat/issue-14-configurable-tier-limits
noteworthy: behavior-change — T3/T4 caps introduced where previously unlimited; defaults match issue #14 spec; operators can override via env
2026-04-14 12:25:00 -07:00
Hongming Wang
2e873cc2e8
docs(plan): add Phase 32 — Cloud SaaS launch roadmap (#59)
New section before the Temporal footnote capturing the gap analysis
between today's self-hosted posture and a multi-tenant cloud SaaS:

- Tier 1 blockers: multi-tenancy (org_id everywhere), WorkOS AuthKit
  for human auth, Fly Machines for container isolation, Stripe
  billing, per-org quotas, managed Postgres/Redis (Neon/Upstash),
  KMS-backed secrets, migrations out of app boot
- Tier 1 follow-ups: Sentry + Grafana, per-org rate limiting,
  Cloudflare, onboarding flow, transactional email, admin panel,
  ToS/DPA
- Tier 2 tech-stack upgrades (non-blocking): pgx/v5 + sqlc, River
  for platform async (NOT Temporal — that stays in workspace-template
  as an agent tool), TanStack Query, Turbopack, uv for Python,
  Python MCP client, shadcn/ui CLI
- Tier 3 explicitly NOT doing: Kubernetes, ORMs, framework swaps,
  build-auth-yourself, canvas library swaps — with reasons
- Tier 4 compliance (post-revenue): SOC 2, status page, staging,
  canary deploys, load testing
- Success criteria: sign-up-to-first-message < 5 min, tenant
  isolation red-teamed, Fly Machines cost documented, Stripe
  end-to-end, first paying design partner

Derived from a tech-stack audit run against the 2026 best-in-class
landscape (pgx won Postgres, River eats Temporal's small-company
slot, WorkOS beats Clerk for per-org SSO, Fly Machines is the only
isolation option without an SRE).

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:24:59 -07:00
Hongming Wang
b123294cf2
Merge pull request #56 from Molecule-AI/docs/sync-2026-04-14-tick-3
docs: sync documentation with 2026-04-14 tick-3 merges (#53, #54, #55)
2026-04-14 12:24:16 -07:00
Hongming Wang
90a513d1d0 feat(plugins): split guardrails into 12 modular plugins
Replaces the proposed monolithic molecule-guardrails plugin with 12
single-purpose plugins users can install à la carte. Powered by a
small extension to the AgentskillsAdaptor base class so any plugin can
ship hooks/, commands/, and a settings-fragment.json without writing a
custom adapter.

## Base adapter changes

workspace-template/plugins_registry/builtins.py + sdk/python/molecule_plugin/builtins.py
(both copies — drift-tested):
- New _install_claude_layer() helper called at the end of install()
- Conditionally copies hooks/ → /configs/.claude/hooks/ (preserving exec bit)
- Conditionally copies commands/*.md → /configs/.claude/commands/
- Conditionally merges settings-fragment.json into /configs/.claude/settings.json
  with ${CLAUDE_DIR} placeholder rewritten to the workspace's absolute install
  path. Existing user hooks are preserved (deep-merge by event name).
- All steps no-op when the plugin doesn't ship the corresponding files,
  so existing skill+rule plugins (molecule-dev, superpowers, ecc,
  browser-automation) are unchanged.

Drift test (tests/test_plugins_builtins_drift.py) still passes.

## 12 new plugins

Hook plugins (ambient enforcement):
- molecule-careful-bash       — refuses destructive bash; ships careful-mode skill
- molecule-freeze-scope       — locks edits via .claude/freeze
- molecule-audit-trail        — appends every Edit/Write to audit.jsonl
- molecule-session-context    — auto-loads cron-learnings at session start
- molecule-prompt-watchdog    — injects warnings on destructive prompt keywords

Skill plugins (on-demand):
- molecule-skill-code-review        — 16-criteria multi-axis review
- molecule-skill-cross-vendor-review — adversarial second-model review
- molecule-skill-llm-judge          — deliverable-vs-request scoring
- molecule-skill-update-docs        — post-merge doc sync
- molecule-skill-cron-learnings     — operational-memory JSONL format

Workflow plugins (slash commands):
- molecule-workflow-triage  — /triage full PR-triage cycle
- molecule-workflow-retro   — /retro + cron-retro skill, weekly retrospective

Each ships only what it needs — most have just plugin.yaml + skills/ or
hooks/ + adapter (one-line stub: `from plugins_registry.builtins import
AgentskillsAdaptor as Adaptor`). Total ~120 files but each plugin is
small and self-contained.

## Verification

- python3 -m molecule_plugin validate plugins/molecule-* → all 13 valid
  (12 new + pre-existing molecule-dev)
- End-to-end install smoke test on representative samples: hook plugin
  (molecule-careful-bash), skill-only plugin (molecule-skill-code-review),
  workflow plugin (molecule-workflow-triage). All produce expected
  /configs/ tree, settings.json paths rewritten, exec bits preserved,
  zero warnings.
- workspace-template pytest tests/test_plugins_builtins_drift.py → passes
  (SDK + runtime stay in sync).

## CLAUDE.md repo-doc updated

Lists all 12 new plugins under the existing Plugins section, organized
by category (hook / skill / workflow). Each entry one line, recommend-
together hints where dependencies make sense.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:20:04 -07:00
Hongming Wang
3f8eb7406f feat(.claude): ambient hooks + sequential-thinking MCP + /triage command
Skills are opt-in (I have to remember to invoke them). Hooks are
ambient — they fire on every matching event automatically. This PR
moves the careful-mode and learnings discipline from "doc I should
read" to "harness-enforced behavior I cannot bypass".

## 6 new hooks (.claude/hooks/)

- pre-bash-careful — REFUSES git push --force to main, rm -rf at root,
  DROP TABLE against prod schema. WARNs on force-with-lease, gh pr/
  issue close. Tested: blocks the destructive case, allows safe ones.
- pre-edit-freeze — implements /freeze. When .claude/freeze contains
  a path glob, edits outside it are denied. Tested: edits to PLAN.md
  blocked when scope locked to platform/internal/handlers/.
- session-start-context — auto-loads last 20 cron-learnings, freeze
  status, open-PR/issue counts as additionalContext at session start.
  Tested: emits valid SessionStart JSON.
- post-edit-audit — appends every Edit/Write to .claude/audit.jsonl
  (gitignored). One-line records {ts, tool, file, ok}. Tested writes.
- user-prompt-tag — injects context warnings when prompt mentions
  force-push, drop-table, "delete all", "push to main", etc. Tested:
  emits warning for "force push the fix to main".
- subagent-stop-judge — off by default; touch .claude/judge-subagents
  to enable. When on, prompts orchestrator to verify subagent's last
  message addresses the original task. Cost-free MVP (no LLM call yet).

All hooks are Python (jq isn't on the hook PATH on macOS — Python is).
Shared helpers in _lib.py: read_input, deny_pretooluse, add_context,
warn_to_stderr.

## settings.json — wires all 6 hooks

Adds SessionStart, UserPromptSubmit, SubagentStop event handlers.
Existing PreToolUse:Bash + PostToolUse:Edit chains gain the new hooks
alongside the existing ones (check-inbox.sh, echo reminder).

Adds @modelcontextprotocol/server-sequential-thinking MCP server for
structured chain-of-thought scratchpad — useful when triaging multiple
PRs in parallel without losing context.

## .claude/commands/triage.md — slash command shortcut

Manual /triage runs the same flow as the c5074cd5 hourly cron, on
demand. Saves ~4KB of prompt every invocation by pulling the cron
prompt out of working memory.

## CLAUDE.md additions

New "Agent operating rules (auto-loaded — read first)" section right
after Ecosystem Context. Documents:
- Cron / triage discipline (read learnings, treat docs PRs touching
  CLAUDE.md/PLAN.md as noteworthy, write per-tick reflections)
- Table of all 6 hooks active in this repo
- List of skills and how to invoke them
- Standing rules (inviolable) consolidated for the agent

This block auto-loads into every conversation context — free behavior
change without me remembering to opt in.

## .gitignore

audit.jsonl, freeze, judge-subagents, per-tick-reflections.md are all
local operational state, never committed.

## Verification

- echo '{"tool_input":{"command":"git push --force origin main"}}' |
  bash pre-bash-careful.sh → emits deny JSON ✓
- Same for git status (safe command) → empty output, exit 0 ✓
- pre-edit-freeze with .claude/freeze=platform/handlers/ blocks
  edits to PLAN.md, allows edits inside the locked path ✓
- post-edit-audit appends valid JSONL ✓
- session-start-context emits additionalContext with PR/issue counts ✓
- user-prompt-tag emits warning for "force push to main" prompt ✓
- python3 -c "json.load(open('.claude/settings.json'))" → valid ✓

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:00:35 -07:00
Hongming Wang
9d914193d2 feat(.claude): 5 gstack-inspired skills + cron upgrades
Research on garrytan/gstack surfaced 5 patterns worth importing into
our cron / agent setup. These are skills, not platform code — they
guide how the cron and our own subagents work, not what the platform
does at runtime.

## New skills

1. **cross-vendor-review** — adversarial second-model review for
   noteworthy PRs (auth, billing, data deletion, migrations). Catches
   the 15-30% of bugs single-model review misses. Inspired by
   gstack's /codex.

2. **careful-mode** — REFUSE/WARN/ALLOW lists for destructive
   commands. Refuses force-push to main, blocks merging draft PRs,
   prevents rm -rf outside scratch dirs. Inspired by gstack's
   /careful + /freeze.

3. **cron-learnings** — per-project JSONL of operational learnings
   appended at the end of every tick, replayed at the start of the
   next. Stops the cron from re-litigating decided issues.
   Inspired by gstack's /learn.

4. **cron-retro** — weekly retrospective auto-posted as a GitHub
   issue. Sunday 23:07 local. Tracks PR count, time-to-merge, gate
   failure trends, code-review severity over time. Inspired by
   gstack's /retro.

5. **llm-judge** — cheap LLM-as-judge eval to catch "agent shipped
   the wrong thing" — the failure mode unit tests miss. Plug into
   issue-pickup pipeline so worker-agent draft PRs get scored before
   being marked ready. Inspired by gstack's tier-3 test infra.

## Cron updates (session-only, c5074cd5 + 060d136c)

- Hourly triage cron now opens with careful-mode activation +
  cron-learnings replay (Step 0)
- code-review skill on every PR being considered for merge
  (Step 2 supplement A — already present, formalized)
- cross-vendor-review on noteworthy PRs (Step 2 supplement B — new)
- llm-judge on issue-pickup draft PRs before marking ready (Step 4)
- Status report now includes cross-vendor pass/fail and llm-judge
  scores (Step 5)
- End-of-tick cron-learnings append (Step 5)
- New weekly cron at Sun 23:07 invokes the cron-retro skill

## What we did NOT take from gstack

- Their browser fork — not our product
- The 23 named roles — we have agent role templates already
- Bun toolchain — adds yet another runtime to our stack
- /design-shotgun and design-tool variants — we're not a design tool
- /document-release — our update-docs skill already covers this

See PR description for full research notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:36:55 -07:00
Hongming Wang
479f1776a8 feat(provisioner): configurable per-tier memory/CPU limits (#14)
Resolves #14. ApplyTierConfig now reads TIER{2,3,4}_MEMORY_MB and
TIER{2,3,4}_CPU_SHARES env vars, falling back to the compiled defaults
agreed in the issue:

  - T2: 512 MiB  / 1024 shares (1 CPU)  — unchanged baseline
  - T3: 2048 MiB / 2048 shares (2 CPU)  — new cap (previously uncapped)
  - T4: 4096 MiB / 4096 shares (4 CPU)  — new cap (previously uncapped)

CPU_SHARES follows Docker's 1024 = 1 CPU convention; internally the
value is translated to NanoCPUs for a hard allocation so behaviour
remains deterministic across hosts. Malformed or non-positive env
values silently fall back to the default.

Behaviour change note: T3 and T4 previously had no explicit cap.
Operators who relied on unlimited can set very large TIERn_MEMORY_MB /
TIERn_CPU_SHARES values; a follow-up can add unset-means-unlimited
semantics if required.

Tests:
  - TestGetTierMemoryMB_DefaultsMatchLegacy
  - TestGetTierMemoryMB_EnvOverride (covers malformed + zero fallback)
  - TestGetTierCPUShares_EnvOverride
  - TestApplyTierConfig_T3_UsesEnvOverride (wiring)
  - TestApplyTierConfig_T3_DefaultCap (documents the new cap)

Docs: .env.example section + CLAUDE.md platform env-vars list updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:49:37 -07:00
Hongming Wang
7ad3173c10 fix(provisioner): preserve Claude session directory across restart (#12)
Resolves #12. The claude-code SDK stores conversations in
/root/.claude/sessions/ and Postgres tracks current_session_id, but the
container filesystem was recreated on every restart — next agent message
failed with "No conversation found with session ID: <uuid>".

Add a per-workspace named Docker volume (ws-<id>-claude-sessions) mounted
read-write at /root/.claude/sessions. Gated by runtime=claude-code so
other runtimes don't pay for a path they don't use. Volume is cleaned up
in RemoveVolume alongside the config volume.

Two opt-outs discard the volume before restart for a fresh session:
  - env WORKSPACE_RESET_SESSION=1 on the container
  - POST /workspaces/:id/restart?reset=true (or {"reset": true} body)

Plumbed via new ResetClaudeSession field on WorkspaceConfig +
provisionWorkspaceOpts helper so the flag stays request-scoped (not
persisted on CreateWorkspacePayload).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:45:30 -07:00
Hongming Wang
dcf8a07887 docs: sync documentation with 2026-04-14 tick-3 merges (#53, #54, #55)
- docs/edit-history/2026-04-14.md: append tick-3 section covering the
  admin test-token route (#53), the prior-tick doc-sync PR (#54), and
  the hermes required_env alignment (#55). Record measured test counts
  (Go +4 for the TestAdminTestToken_* quartet).
- CLAUDE.md: bump Go test count 695 → 699 with a note pointing at the
  new quartet. Route-table row and env-var mentions for the admin
  route already landed with #53; verified on main.
- .env.example: add MOLECULE_ENABLE_TEST_TOKENS with a comment about
  the prod-hidden default. Closes the code-review doc-sync flag from
  #53 (var was in CLAUDE.md but missing from .env.example).

No PLAN.md / README.md / README.zh-CN.md update needed — none of the
three merges expose a user-visible surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:37:42 -07:00