Commit Graph

14 Commits

Author SHA1 Message Date
Hongming Wang
97dbe1c987 feat(canary): rollback-latest script + release-pipeline doc (Phase 4)
Closes the canary loop with the escape hatch and a single place to
read about the whole flow.

scripts/rollback-latest.sh <sha>
  uses crane to retag :latest ← :staging-<sha> for BOTH the platform
  and tenant images. Pre-checks the target tag exists and verifies
  the :latest digest after the move so a bad ops typo doesn't
  silently promote the wrong thing. Prod tenants auto-update to the
  rolled-back digest within their 5-min cycle. Exit codes: 0 = both
  retagged, 1 = registry/tag error, 2 = usage error.

docs/architecture/canary-release.md
  The one-page map of the pipeline: how PR → main → staging-<sha> →
  canary smoke → :latest promotion works end-to-end, how to add a
  canary tenant, how to roll back, and what this gate explicitly does
  NOT catch (prod-only data, config drift, cross-tenant bugs).

No code changes in the CP or workspace-server — this PR is shell
+ docs only, so it's safe to land independently of the other Phase
{1,1.5,2,3} PRs still in review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 03:37:42 -07:00
Hongming Wang
ca5a5f1a7f docs: 2026-04-19 SaaS prod migration notes
Captures the 10-PR staging→main cutover: what shipped, the three new
Railway prod env vars (PROVISION_SHARED_SECRET / EC2_VPC_ID /
CP_BASE_URL), and the sharp edge for existing tenants — their
containers pre-date PR #53 so they still need MOLECULE_CP_SHARED_SECRET
added manually (or a re-provision) before the new CPProvisioner's
outbound bearer works.

Also includes a post-deploy verification checklist and rollback plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 02:29:31 -07:00
Hongming Wang
aaa6a4db83 fix(docs): update architecture + API reference paths for workspace-server rename
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 01:25:21 -07:00
Hongming Wang
0d3c57cced chore: gitignore CLAUDE.md, extract content to proper docs
CLAUDE.md was a 44KB catch-all mixing architecture docs (useful for
everyone) with agent operating instructions (internal). Split:

- docs/architecture/overview.md — system architecture, component
  descriptions, 13 key patterns (import cycles, health detection,
  communication rules, WebSocket flow, lifecycle, etc.)
- docs/api-reference.md — full REST API route table + database schema
- CLAUDE.md → gitignored (stays local for agent tooling)

All internal PR/issue references stripped from the new docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:43:33 -07:00
Hongming Wang
39074cc4ae chore: final open-source cleanup — binary, stale paths, private refs
- Remove compiled workspace-server/server binary from git
- Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs
- Fix CI workflow path filters (workspace-template → workspace)
- Replace real EC2 IP and personal slug in test_saas_tenant.sh
- Scrub molecule-controlplane references in docs
- Fix stale workspace-template/ paths in provisioner, handlers, tests
- Clean tracked Python cache files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:38:55 -07:00
Hongming Wang
d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00
Hongming Wang
295c4d930a chore: open-source preparation — scrub secrets, add community files
Security:
- Replace hardcoded Cloudflare account/zone/KV IDs in wrangler.toml
  with placeholders; add wrangler.toml to .gitignore, ship .example
- Replace real EC2 IPs in docs with <EC2_IP> placeholders
- Redact partial CF API token prefix in retrospective
- Parameterize Langfuse dev credentials in docker-compose.infra.yml
- Replace Neon project ID in runbook with <neon-project-id>

Community:
- Add CONTRIBUTING.md (build, test, branch conventions, CI info)
- Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1)

Cleanup:
- Replace personal runner username/machine name in CI + PLAN.md
- Replace personal tenant URL in MCP setup guide
- Replace personal author field in bundle-system doc
- Replace personal login in webhook test fixture
- Rewrite cryptominer incident reference as generic security remediation
- Remove private repo commit hashes from PLAN.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:10:56 -07:00
Hongming Wang
b0eed5135f fix: resolve PLAN.md merge conflict — keep both Phase 34 and Phase 36
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 21:41:32 -07:00
Hongming Wang
a873ae0eae docs: staging environment design + Phase 36 plan
Full staging environment that mirrors production. Every infra change
ships to staging first before promotion. Gates Phase 33 (Tunnel) and
Phase 35 (security hardening).

Components: Railway staging env, Neon branch, staging DNS, tagged
Docker images, promotion workflow, automated smoke tests.

Also marks Phase 33 as migrating from Worker to Cloudflare Tunnel
(issue #933), prerequisite: staging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 20:37:11 -07:00
Hongming Wang
7094290850 docs: Partner API Keys architecture + Phase 34 plan
Adds programmatic org management for partner platforms, CI/CD, and
automation. Partners authenticate with mol_pk_* API keys (SHA-256
hashed, scoped, rate-limited, revocable) alongside existing WorkOS
browser auth.

- Full architecture doc with schema, scopes, middleware integration,
  security considerations, and use cases
- Phase 34 in PLAN.md (4 sub-phases)
- CLAUDE.md cross-reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 14:07:50 -07:00
Hongming Wang
20750cf128 docs: tenant image upgrade strategies (Options A/B/C)
Documents three upgrade strategies for keeping tenant EC2 instances
current with platform-tenant:latest:
- Option A: Rolling restart via CP admin endpoint (coordinated)
- Option B: Sidecar auto-updater cron (implemented, 5 min interval)
- Option C: Blue-green via Worker (zero downtime, future)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 11:59:15 -07:00
Hongming Wang
8c02d2d878 docs(wildcard-dns): address CEO review — KV cache, WebSocket, proxy trust
Addresses all 4 review points from PR #786:
1. Worker resilience: 3-tier cache (in-memory → KV → CP API) with stale
   fallback so CP outages are invisible to tenants
2. WebSocket proxying: documented upgradeHeader handling, fallback to
   keep Caddy for WS-only if Workers WS is unreliable
3. SG automation: note to auto-update Cloudflare IP ranges, don't hardcode
4. Trusted proxy: X-Forwarded-For / CF-Connecting-IP trust chain documented

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 10:17:43 -07:00
Hongming Wang
d36b612bbf docs: wildcard DNS + Cloudflare Worker proxy architecture
Adds Phase 33 plan and architecture doc for replacing per-tenant DNS
records with a wildcard DNS + Cloudflare Worker proxy pattern.

Eliminates: DNS propagation delays, NXDOMAIN caching, per-instance
Let's Encrypt, Caddy on EC2. Same pattern used by Vercel, Railway,
Fly.io, WordPress, n8n.

4-phase migration: deploy Worker → stop creating DNS records →
remove Caddy from EC2 → cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 10:02:32 -07:00
Hongming Wang
24fec62d7f initial commit — Molecule AI platform
Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1)
with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo.

Brand: Starfire → Molecule AI.
Slug: starfire / agent-molecule → molecule.
Env vars: STARFIRE_* → MOLECULE_*.
Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform.
Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent.
DB: agentmolecule → molecule.

History truncated; see public repo for prior commits and contributor
attribution. Verified green: go test -race ./... (platform), pytest
(workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:55:37 -07:00