molecule-core

Author	SHA1	Message	Date
Hongming Wang	bc82fa4e0e	docs(security): move sensitive runbooks to private internal repo Three changes to stop ferrying sensitive content through our public monorepo. All content already imported to Molecule-AI/internal (private) — see linked PRs below. ## docs/incidents/INCIDENT_LOG.md — replaced with stub Contained full security audit cycle records with CWE references, file:line pointers to historical vulnerabilities, and severity ratings. None of that belongs in a public repo. → Moved to Molecule-AI/internal/security/incident-log.md (PR #20). Monorepo file becomes a 17-line stub pointing at the internal location. Future incidents land in the internal file only. ## docs/architecture/canary-release.md — redacted identifiers Had AWS account ID `004947743811` and IAM role name `MoleculeStagingProvisioner` embedded. Even though the fleet described isn't actually running (see state note), these identifiers are account-specific and don't belong in public git. → Removed both values, replaced with generic references + a pointer to Molecule-AI/internal/runbooks/canary-fleet.md (PR #21) where the actual identifiers live. Any future rotation touches the internal file, no public-git-history rewrite needed. ## docs/infra/workspace-terminal.md — reduced to public summary Contained the full ops runbook: bootstrap script output, per-tenant SG backfill loop with live SG IDs, customer slug names (hongmingwang). Useful content but too specific for a public repo. → Moved to Molecule-AI/internal/runbooks/workspace-terminal.md (PR #22). Monorepo file becomes a 30-line public summary of what the feature does + pointers to code, so external readers / self-hosters still get the design story. ## What's NOT in this PR (follow-up) Marketing briefs, SEO plans, campaign copy, research dossiers, and internal product designs (hermes-adapter-plan, medo-integration, cognee-*) are the next batches. See docs policy doc coming next to set team expectations. Net removal: ~820 lines from public git going forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:17:11 -07:00
Hongming Wang	ded10a0660	docs(canary-release): flag as aspirational; link to current state The canary-release.md doc describes the pipeline as if the fleet is running — referring to AWS account 004947743811 and a configured MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary tenants are provisioned, the 3 GH Actions secrets are empty, and canary-verify.yml has failed 7/7 times in a row. Added a top-of-doc ⚠️ state note that: 1. Clarifies this is intended design, not deployed reality. 2. Notes the AWS account ID is historical / unverified. 3. Explains that merges currently rely on manual promote-latest. 4. Cross-links to molecule-controlplane/docs/canary-tenants.md for the Phase 1 work that's shipped, the Phase 2 stand-up plan, and the "should we even do this now?" decision framework. 5. Asks whoever lands Phase 2 to reconcile the two docs. No behaviour change — doc-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:03:27 -07:00
Hongming Wang	00a0fc91fe	docs: strip internal roadmap/followups from public org-api-keys docs The monorepo docs/ tree is ecosystem + user-facing. Internal roadmap ("what we'll build next", priorities, effort estimates) doesn't belong there — customers reading our docs don't need our backlog in their face, and we shouldn't signal "feature X is coming" contractually when it's just a P2 item in internal tracking. Removes: - docs/architecture/org-api-keys-followups.md (the whole prioritized roadmap). Moved to the internal repo at runbooks/org-api-keys-followups.md where it belongs. - "Follow-up roadmap" section in docs/architecture/org-api- keys.md, replaced with a shorter "Known limitations" section that names the current constraints (full-admin only, no expiry, no user_id in session-minted audit) without speculating on when they change. - "What's coming" section in docs/guides/org-api-keys.md, replaced with "Current limits" that names the same constraints from the user's POV. Public docs now describe the feature as it exists TODAY. Internal tracking of what comes next lives in Molecule-AI/internal (private).	2026-04-20 14:31:46 -07:00
Hongming Wang	3d7244ab94	feat(auth): org tokens reach /workspaces/:id/* subroutes + docs Extends WorkspaceAuth to accept org API tokens as a valid credential for any workspace sub-route in the org. Previously a user minting an org token could hit admin-surface endpoints (/workspaces, /org/import, etc.) but couldn't reach per-workspace routes like /workspaces/:id/channels — those were gated by WorkspaceAuth which only knew about workspace-scoped tokens. Scope matches the explicit product spec: one org API key can manipulate every workspace in the org. AI agents given a key can read/write channels, tokens, schedules, secrets, tasks across all workspaces. ## WorkspaceAuth tier order 1. ADMIN_TOKEN exact match (break-glass / bootstrap) 2. Org API token (Validate against org_api_tokens) NEW 3. Workspace-scoped token (ValidateToken with :id binding) 4. Same-origin canvas referer Org token tier sits above the per-workspace check so a presenter of an org key doesn't hit the narrower ValidateToken failure path first. Checked with isSameOriginCanvas path unchanged. ## End-to-end verified Minted test token via ADMIN_TOKEN, then with that org token: - GET /workspaces → 200 (list all) - GET /workspaces/<id> → 200 (detail, admin-only route) - GET /workspaces/<id>/channels → 200 (workspace sub-route) - GET /workspaces/<id>/tokens → 200 (workspace tokens list) - GET /workspaces/<bad-uuid> → 404 workspace not found (routing still scoped correctly) ## Documentation - docs/architecture/org-api-keys.md — design, data model, threat model, security properties - docs/architecture/org-api-keys-followups.md — 10 tracked follow-ups prioritized (role scoping P1, per-workspace binding P1, expiry P2, usage metrics P2, WorkOS user_id capture P2, rotation webhooks P3, mint-rate limit P3, audit log P2, CLI P3, migrate ADMIN_TOKEN to the same table P4) - docs/guides/org-api-keys.md — end-user guide (mint via UI, use in curl/Python/TS/AI agents, session-vs-key comparison) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:11:45 -07:00
Hongming Wang	97dbe1c987	feat(canary): rollback-latest script + release-pipeline doc (Phase 4) Closes the canary loop with the escape hatch and a single place to read about the whole flow. scripts/rollback-latest.sh <sha> uses crane to retag :latest ← :staging-<sha> for BOTH the platform and tenant images. Pre-checks the target tag exists and verifies the :latest digest after the move so a bad ops typo doesn't silently promote the wrong thing. Prod tenants auto-update to the rolled-back digest within their 5-min cycle. Exit codes: 0 = both retagged, 1 = registry/tag error, 2 = usage error. docs/architecture/canary-release.md The one-page map of the pipeline: how PR → main → staging-<sha> → canary smoke → :latest promotion works end-to-end, how to add a canary tenant, how to roll back, and what this gate explicitly does NOT catch (prod-only data, config drift, cross-tenant bugs). No code changes in the CP or workspace-server — this PR is shell + docs only, so it's safe to land independently of the other Phase {1,1.5,2,3} PRs still in review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 03:37:42 -07:00
Hongming Wang	ca5a5f1a7f	docs: 2026-04-19 SaaS prod migration notes Captures the 10-PR staging→main cutover: what shipped, the three new Railway prod env vars (PROVISION_SHARED_SECRET / EC2_VPC_ID / CP_BASE_URL), and the sharp edge for existing tenants — their containers pre-date PR #53 so they still need MOLECULE_CP_SHARED_SECRET added manually (or a re-provision) before the new CPProvisioner's outbound bearer works. Also includes a post-deploy verification checklist and rollback plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 02:29:31 -07:00
Hongming Wang	aaa6a4db83	fix(docs): update architecture + API reference paths for workspace-server rename Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 01:25:21 -07:00
Hongming Wang	0d3c57cced	chore: gitignore CLAUDE.md, extract content to proper docs CLAUDE.md was a 44KB catch-all mixing architecture docs (useful for everyone) with agent operating instructions (internal). Split: - docs/architecture/overview.md — system architecture, component descriptions, 13 key patterns (import cycles, health detection, communication rules, WebSocket flow, lifecycle, etc.) - docs/api-reference.md — full REST API route table + database schema - CLAUDE.md → gitignored (stays local for agent tooling) All internal PR/issue references stripped from the new docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:43:33 -07:00
Hongming Wang	39074cc4ae	chore: final open-source cleanup — binary, stale paths, private refs - Remove compiled workspace-server/server binary from git - Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs - Fix CI workflow path filters (workspace-template → workspace) - Replace real EC2 IP and personal slug in test_saas_tenant.sh - Scrub molecule-controlplane references in docs - Fix stale workspace-template/ paths in provisioner, handlers, tests - Clean tracked Python cache files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:38:55 -07:00
Hongming Wang	d8026347e5	chore: open-source restructure — rename dirs, remove internal files, scrub secrets Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:24:44 -07:00
Hongming Wang	295c4d930a	chore: open-source preparation — scrub secrets, add community files Security: - Replace hardcoded Cloudflare account/zone/KV IDs in wrangler.toml with placeholders; add wrangler.toml to .gitignore, ship .example - Replace real EC2 IPs in docs with <EC2_IP> placeholders - Redact partial CF API token prefix in retrospective - Parameterize Langfuse dev credentials in docker-compose.infra.yml - Replace Neon project ID in runbook with <neon-project-id> Community: - Add CONTRIBUTING.md (build, test, branch conventions, CI info) - Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1) Cleanup: - Replace personal runner username/machine name in CI + PLAN.md - Replace personal tenant URL in MCP setup guide - Replace personal author field in bundle-system doc - Replace personal login in webhook test fixture - Rewrite cryptominer incident reference as generic security remediation - Remove private repo commit hashes from PLAN.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:10:56 -07:00
Hongming Wang	b0eed5135f	fix: resolve PLAN.md merge conflict — keep both Phase 34 and Phase 36 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 21:41:32 -07:00
Hongming Wang	a873ae0eae	docs: staging environment design + Phase 36 plan Full staging environment that mirrors production. Every infra change ships to staging first before promotion. Gates Phase 33 (Tunnel) and Phase 35 (security hardening). Components: Railway staging env, Neon branch, staging DNS, tagged Docker images, promotion workflow, automated smoke tests. Also marks Phase 33 as migrating from Worker to Cloudflare Tunnel (issue #933), prerequisite: staging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 20:37:11 -07:00
Hongming Wang	7094290850	docs: Partner API Keys architecture + Phase 34 plan Adds programmatic org management for partner platforms, CI/CD, and automation. Partners authenticate with mol_pk_* API keys (SHA-256 hashed, scoped, rate-limited, revocable) alongside existing WorkOS browser auth. - Full architecture doc with schema, scopes, middleware integration, security considerations, and use cases - Phase 34 in PLAN.md (4 sub-phases) - CLAUDE.md cross-reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 14:07:50 -07:00
Hongming Wang	20750cf128	docs: tenant image upgrade strategies (Options A/B/C) Documents three upgrade strategies for keeping tenant EC2 instances current with platform-tenant:latest: - Option A: Rolling restart via CP admin endpoint (coordinated) - Option B: Sidecar auto-updater cron (implemented, 5 min interval) - Option C: Blue-green via Worker (zero downtime, future) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:59:15 -07:00
Hongming Wang	8c02d2d878	docs(wildcard-dns): address CEO review — KV cache, WebSocket, proxy trust Addresses all 4 review points from PR #786: 1. Worker resilience: 3-tier cache (in-memory → KV → CP API) with stale fallback so CP outages are invisible to tenants 2. WebSocket proxying: documented upgradeHeader handling, fallback to keep Caddy for WS-only if Workers WS is unreliable 3. SG automation: note to auto-update Cloudflare IP ranges, don't hardcode 4. Trusted proxy: X-Forwarded-For / CF-Connecting-IP trust chain documented Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 10:17:43 -07:00
Hongming Wang	d36b612bbf	docs: wildcard DNS + Cloudflare Worker proxy architecture Adds Phase 33 plan and architecture doc for replacing per-tenant DNS records with a wildcard DNS + Cloudflare Worker proxy pattern. Eliminates: DNS propagation delays, NXDOMAIN caching, per-instance Let's Encrypt, Caddy on EC2. Same pattern used by Vercel, Railway, Fly.io, WordPress, n8n. 4-phase migration: deploy Worker → stop creating DNS records → remove Caddy from EC2 → cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 10:02:32 -07:00
Hongming Wang	24fec62d7f	initial commit — Molecule AI platform Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1) with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo. Brand: Starfire → Molecule AI. Slug: starfire / agent-molecule → molecule. Env vars: STARFIRE_* → MOLECULE_*. Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform. Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent. DB: agentmolecule → molecule. History truncated; see public repo for prior commits and contributor attribution. Verified green: go test -race ./... (platform), pytest (workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:55:37 -07:00

18 Commits