08c2bd4d9a
CI / Python Lint & Test (pull_request) Successful in 3s
sop-checklist / review-refire (pull_request_target) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 15s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Chat / detect-changes (pull_request) Successful in 15s
CI / Canvas (Next.js) (pull_request) Successful in 1s
CI / Platform (Go) (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
CI / Canvas Deploy Status (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request_target) Failing after 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 21s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 33s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 53s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 19s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1m23s
CI / all-required (pull_request) Successful in 4s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 4s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 9s
security-review / approved (pull_request_review) Successful in 9s
audit-force-merge / audit (pull_request_target) Successful in 8s
Three real bugs in the regression test, all surfaced by CI:
1) Mock server didn't reliably come up: the port-probe didn't use
SO_REUSEADDR (so a freed probe port could TIME_WAIT the server's
bind), and the readiness wait was a chained curl+grep shell
pipeline (racy pipe-handle interactions under CI load). Replaced
with a Python-based readiness probe (TCP connect + HTTP GET +
JSON parse + status==active check, single source of truth) and a
kill -0 on the server PID so a crash surfaces with stderr instead
of timing out silently. Bumped the ceiling 10s -> 15s (75 * 0.2s)
for busy runners.
2) Inactive-token case omits CF_ZONE_ID: only CF_API_TOKEN was set
for case (b), so the script's 'need CF_ZONE_ID' guard short-
circuited BEFORE the preflight and we never actually exercised
the auth-failure path. Set the full ENV_TOKENS (same as the
success case) for (b) so a missing CF_ZONE_ID can't mask the
regression we want to catch.
3) EXPECTED_COUNT=3 was stale: the preflight addition brought the
CF base refs in sweep-cf-orphans.sh from 3 to 4 (token-verify +
zone-lookup in the preflight block, plus the original 2 in the
sweep body). The patch-and-redirect test then replaced 4
occurrences, not 3, and the count assertion failed. Updated to 4
with a comment.
4) Server returned zone id 'zones' for active/down: the Python
mock extracted zone_id from rest.split('/')[2] which is the
literal 'zones' token, not the actual zone id (which lives at
index 3 after the /client/v4/ prefix). Active/down cases then
tripped the preflight's zone-mismatch check. Use seg[3] (with a
seg[-1] fallback) and add a comment explaining the layout.
No change to the preflight behavior in scripts/ops/sweep-cf-orphans.sh
— only the test harness. The four critical behaviors are now
exercised deterministically:
(a) active token + reachable zone -> preflight passes
(b) inactive token -> preflight fails fast, no gather
(c) zone id mismatch -> preflight fails on mismatch
(d) 500 + non-JSON -> preflight fails on non-JSON
Locally verified: 'bash scripts/ops/test_sweep_cf_orphans_preflight.sh'
prints all four PASS lines and exits 0.
scripts/
Operational and one-off scripts for molecule-core. Most are self-documenting — see the header comments in each file.
RFC #2251 coordinator task-bound harnesses
There are three related scripts; pick the right one:
| Script | Purpose | Targets |
|---|---|---|
measure-coordinator-task-bounds.sh |
Canonical v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via claude-code-default + claude-code templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + activity trace. |
OSS-shape platform — localhost or any /workspaces-shaped endpoint. Has tenant/admin-token guards for non-localhost runs. |
measure-coordinator-task-bounds-runner.sh |
Generalised runner for the same measurement contract but with arbitrary template + secret + model combinations (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. | Same as above (local or SaaS via MODE=saas). |
measure-coordinator-task-bounds.sh (in molecule-controlplane) |
Production-shape variant that bootstraps a real staging tenant via POST /cp/admin/orgs, then runs the same measurement against <slug>.staging.moleculesai.app. |
Staging controlplane only — refuses to run against production. |
See reference_harness_pair_pattern (auto-memory) for when to use which
and the cross-repo design rationale.
Common safety pattern across all three
- Cleanup trap on EXIT/INT/TERM auto-deletes provisioned resources.
DRY_RUN=1prints plan + auth fingerprint, exits before any state mutation. Run this before pointing at staging or any shared infrastructure.- Non-target guard refuses arbitrary endpoints (the controlplane
variant is locked to
staging-api.moleculesai.app; the OSS variant requires explicit auth + tenant scoping for non-localhost PLATFORM). - Cleanup failures emit
cleanup_*_failedevents with remediation hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a structured event rather than a silent leak.
Activity trace caveat
If activity_trace.raw == "<endpoint_unavailable>", the per-workspace
/activity endpoint isn't wired on the target build — the bound
measurement is INCONCLUSIVE on the platform-ceiling question. Either
wire the endpoint or replace with the equivalent Datadog query. Note
that /activity accepts a since_secs query parameter; see the
endpoint handler for the supported range.
Other scripts
cleanup-rogue-workspaces.sh— emergency teardown for leaked workspaces. Prompts for confirmation. Pair with the harnesses if a cleanup trap fails (seecleanup_*_failedevents).staging-smoke.sh— quick smoke test for the staging canary fleet (formerlycanary-smoke.sh).dev-start.sh— local-dev platform bring-up.
The rest are self-documenting in their header comments.