Commit Graph

5 Commits

Author SHA1 Message Date
Hongming Wang
28bf11fb85 docs(security): move sensitive runbooks to private internal repo
Three changes to stop ferrying sensitive content through our public
monorepo. All content already imported to Molecule-AI/internal (private)
— see linked PRs below.

Contained full security audit cycle records with CWE references,
file:line pointers to historical vulnerabilities, and severity
ratings. None of that belongs in a public repo.

→ Moved to Molecule-AI/internal/security/incident-log.md (PR #20).
  Monorepo file becomes a 17-line stub pointing at the internal
  location. Future incidents land in the internal file only.

Had AWS account ID `004947743811` and IAM role name
`MoleculeStagingProvisioner` embedded. Even though the fleet
described isn't actually running (see state note), these
identifiers are account-specific and don't belong in public git.

→ Removed both values, replaced with generic references + a pointer
  to Molecule-AI/internal/runbooks/canary-fleet.md (PR #21) where
  the actual identifiers live. Any future rotation touches the
  internal file, no public-git-history rewrite needed.

Contained the full ops runbook: bootstrap script output, per-tenant
SG backfill loop with live SG IDs, customer slug names
(hongmingwang). Useful content but too specific for a public repo.

→ Moved to Molecule-AI/internal/runbooks/workspace-terminal.md
  (PR #22). Monorepo file becomes a 30-line public summary of what
  the feature does + pointers to code, so external readers /
  self-hosters still get the design story.

Marketing briefs, SEO plans, campaign copy, research dossiers, and
internal product designs (hermes-adapter-plan, medo-integration,
cognee-*) are the next batches. See docs policy doc coming next to
set team expectations.

Net removal: ~820 lines from public git going forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:39:23 +00:00
Hongming Wang
9466542212 docs(infra): add tenant env-var section + fix backfill loop split
Review turned up two issues in the rollout runbook:

1. The tenant env-var list was missing — today's debugging burned 2
   hours on hongmingwang where everything worked infra-side but
   canvas 401'd because MOLECULE_ORG_SLUG and CP_UPSTREAM_URL weren't
   set. Doc without this sends the next operator down the same hole.

   Added a dedicated step-3 table covering CP_UPSTREAM_URL,
   MOLECULE_ORG_SLUG, MOLECULE_ORG_ID, AWS_REGION with the exact
   failure mode each one produces when missing.

2. Backfill loop used tab-separated aws-cli output directly, which
   can concatenate all SG ids into one word and run the loop body
   once with no iteration. Inserted `| tr '\t' '\n'` — no-op on
   well-behaved output, fix on the concatenated case.

Renumbered subsequent sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:01:30 -07:00
Hongming Wang
456b8fd184 docs(infra): workspace-terminal runbook with verified commands
Expanded the rollout section with the exact scripts + env vars
that landed to make Hermes workspace Terminal work on 2026-04-22.
Points at molecule-controlplane#227 (which adds bootstrap script +
EIC_ENDPOINT_SG_ID env var) so operators can reproduce the setup
on a new AWS account in one command.

Also documents the existing-workspace backfill for the instance_id
column — the CP only writes on new provisions, so pre-migration
workspaces need a manual UPDATE before Terminal routes to the
remote path.

Refs: #1528 (resolved)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 19:50:59 -07:00
Hongming Wang
1e47f85495 docs(infra): fix workspace-terminal doc against real CP code
Researched the actual molecule-controlplane repo rather than guessing:
- Workspaces launch in a shared CP workspace VPC (p.VPCID), not per
  tenant
- CP already tags instances with Role=workspace at ec2.go:1126 — my
  prior IAM policy used molecule:role which doesn't match anything
- workspaceIngressRules() currently opens only 8000/tcp — no port 22

Corrected:
- IAM policy Condition now matches existing Role tag (no CP change
  needed for the scope to work fleet-wide)
- Added OpenTunnel action so EIC Endpoint path works
- Dropped the \"open 22 in SG\" recommendation. Cross-VPC topology
  makes SG CIDR rules awkward (would need peering + tenant-CIDR
  bookkeeping). EIC Endpoint is one VPC resource + no SG changes.
- Simplified rollout to two items: add IAM policy, create EIC Endpoint

Kept direct-SG path as an explicit not-recommended alternative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:05:24 -07:00
Hongming Wang
46a8d24b2d feat(workspace): persist CP-returned EC2 instance_id on provision
Foundation for the EIC-based terminal handler (#1528). The tenant's
workspace-server needs to map workspace_id → EC2 instance_id to open
an SSH session, but CPProvisioner.Start returned the instance id only
for logging — it was never written anywhere. This PR adds the column
and writes it at provision time.

Scope kept intentionally small: no terminal code yet. The follow-up
PR will consume this column from the terminal handler.

What's here:
- migrations/038_workspace_instance_id — nullable TEXT column on
  workspaces, partial index on non-null for fast lookup
- workspace_provision.go — UPDATE after CPProvisioner.Start; failure
  logs but doesn't fail provisioning (row just lacks instance_id and
  terminal falls back to the existing not-reachable error)
- docs/infra/workspace-terminal.md — full design for the terminal
  flow: EIC vs SSM comparison, IAM policy JSON, SG rules, key
  lifetime, failure modes, rollout checklist

Refs: #1528
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:56:15 -07:00