Class B Hongming-owned CICD red sweep. The Handlers Postgres Integration
workflow has been silently failing on staging push and PRs ever since
#92 fixed the IPv6 flake — the IPv6 fix correctly pinned 127.0.0.1, but
unmasked a deeper issue: with our act_runner global container.network=host
config, multiple concurrent runs of this workflow each tried to bind
0.0.0.0:5432 on the operator host. The first wins; subsequent postgres
service containers exit with `FATAL: could not create any TCP/IP sockets`
+ `Address in use`. Docker auto-removes them (act_runner sets
AutoRemove:true), so by the time `Apply migrations` runs `psql`, the
container is gone — Connection refused, then `failed to remove container:
No such container` at cleanup time.
Per-job container.network override is silently ignored by act_runner
(`--network and --net in the options will be ignored.`), so we sidestep
`services:` entirely. The job container still uses host-net (required
for cache server discovery on the operator's bridge IP). We launch a
sibling postgres on the existing molecule-monorepo-net bridge with a
unique name per run (run_id+run_attempt) and connect via the bridge IP
read from `docker inspect`.
Verified manually on operator host 2026-05-08: 2× postgres on host-net
collides, but on the bridge with unique names + different IPs, both
succeed and each is reachable from a host-net job container.
Adds:
- always()-cleanup step so containers don't leak on test failure
- Diagnostic dump now includes the postgres container's docker logs
- Runbook at docs/runbooks/ documenting the substrate behavior + the
pattern future workflows should adopt for any `services:`-shaped need.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove compiled workspace-server/server binary from git
- Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs
- Fix CI workflow path filters (workspace-template → workspace)
- Replace real EC2 IP and personal slug in test_saas_tenant.sh
- Scrub molecule-controlplane references in docs
- Fix stale workspace-template/ paths in provisioner, handlers, tests
- Clean tracked Python cache files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security:
- Replace hardcoded Cloudflare account/zone/KV IDs in wrangler.toml
with placeholders; add wrangler.toml to .gitignore, ship .example
- Replace real EC2 IPs in docs with <EC2_IP> placeholders
- Redact partial CF API token prefix in retrospective
- Parameterize Langfuse dev credentials in docker-compose.infra.yml
- Replace Neon project ID in runbook with <neon-project-id>
Community:
- Add CONTRIBUTING.md (build, test, branch conventions, CI info)
- Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1)
Cleanup:
- Replace personal runner username/machine name in CI + PLAN.md
- Replace personal tenant URL in MCP setup guide
- Replace personal author field in bundle-system doc
- Replace personal login in webhook test fixture
- Rewrite cryptominer incident reference as generic security remediation
- Remove private repo commit hashes from PLAN.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add comment to .env.example explaining ANTHROPIC_API_KEY must be set
as a *global* secret (not just workspace-level) so SDK-direct workspaces
(e.g. molecule-hitl, hermes) receive it without 401 errors
- Add ANTHROPIC_API_KEY to saas-secrets.md secret map with context on
why global propagation matters
- Add full rotation procedure section (generate → PUT /settings/secrets
→ verify restart → revoke old key) with blast-radius note
Closes#894
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends the secret map with RESEND_API_KEY, RESEND_FROM_EMAIL,
STRIPE_API_KEY, STRIPE_WEBHOOK_SECRET — the four SaaS secrets the
control plane reads once the current PR stack (#29-#34 on
molecule-controlplane) ships.
Adds rotation procedures for each:
- Resend: low-blast-radius, best-effort sends, domain verification
gotcha documented
- Stripe API key: independent rotation from webhook secret, live verify
via /cp/billing/checkout
- Stripe webhook secret: 24h overlap window procedure using stripe
trigger for live verify
Also adds Resend + Stripe entries to the emergency-contacts list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the 4-step hard-delete cascade implemented in
molecule-controlplane PR #29 (Stripe → Redis → Infra → DB rows),
how to read the org_purges audit table when a purge fails, the 30-day
GDPR deadline, and what the cascade deliberately does NOT cover
(WorkOS users, LLM provider history, Langfuse traces).
Cross-referenced from the "SaaS ops" block in CLAUDE.md so future
agents find it when handling erasure requests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses items 4, 5, 7 from the self-review of the batch merge. PR A
(#228) covered items 1, 2, 3, 6 on the Go side.
## workspace-template/main.py — idle loop hardening
- Replace asyncio.get_event_loop() with asyncio.get_running_loop() —
the former is deprecated in 3.12+ and emits a DeprecationWarning on
every idle fire.
- Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS
clamped to max(60, min(300, idle_interval_seconds)). Long cadence
workspaces no longer hold dangling requests open for 10 minutes; the
cap adapts automatically when the interval is short.
- Type the exception handling: split HTTPError (has .code) from URLError
(connection-level) from the generic catch-all. Log status + error
class separately so operators can grep for specific failure modes
instead of a bare "post failed".
- Fire-and-forget no longer loses exceptions. run_in_executor Future
now has an add_done_callback that logs the outcome, so a panic in
_post_sync surfaces as "Idle loop: post failed — status=None err=..."
instead of Python's default "Task exception was never retrieved"
warning burried in stderr.
## org-templates/molecule-dev/org.yaml — discoverability
Added idle_prompt + idle_interval_seconds to the defaults: block with
explanatory comments. Without this, users had to read main.py to
discover the feature.
## docs/runbooks/admin-auth.md — new
Documents the three middleware variants (AdminAuth strict,
CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each,
and the three-question test for adding a new route to CanvasOrBearer.
Also flags the session-cookie follow-up as Phase H.
Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203,
#228.
No code deltas in platform/ beyond the Python + YAML + docs changes.
Full pytest suite unchanged except the pre-existing test_hermes_smoke
flake that fails in full-suite but passes in isolation (test isolation
bug, not introduced by this PR).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses PR #82 code review: 🟡×3 + 🔵×5.
- Fly registry login username: 'x' → 'molecule-ai' + explanatory comment.
- Build & push split into two steps (GHCR / Fly registry) so a single-
registry outage can't fail the other. Second step uses 'if: always()'
to ensure Fly mirror runs even if GHCR push flakes.
- docs/runbooks/saas-secrets.md: full secret map + rotation procedures
for every SaaS credential, with danger-case callouts. Documents the
coupled FLY_API_TOKEN (lives in GHA secret AND fly secrets — must be
rotated in both).
- CLAUDE.md: new 'SaaS ops' section linking to the runbook.