History

hongming fe1e3722eb Some checks failed Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s Details CI / Detect changes (pull_request) Successful in 25s Details Handlers Postgres Integration / detect-changes (pull_request) Successful in 24s Details E2E API Smoke Test / detect-changes (pull_request) Successful in 28s Details E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 29s Details Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 14s Details Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 27s Details sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7 Details qa-review / approved (pull_request) Failing after 16s Details sop-checklist-gate / gate (pull_request) Successful in 16s Details security-review / approved (pull_request) Failing after 17s Details gate-check-v3 / gate-check (pull_request) Failing after 26s Details Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 42s Details sop-tier-check / tier-check (pull_request) Successful in 34s Details lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m21s Details Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m28s Details lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m44s Details Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s Details CI / Shellcheck (E2E scripts) (pull_request) Successful in 37s Details E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s Details E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 25s Details Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s Details CI / Python Lint & Test (pull_request) Successful in 8m40s Details CI / Canvas (Next.js) (pull_request) Successful in 17m44s Details CI / Platform (Go) (pull_request) Failing after 19m16s Details CI / Canvas Deploy Reminder (pull_request) Has been skipped Details CI / all-required (pull_request) Failing after 7s Details feat(scripts): codify ECR :staging-latest → :latest promote + tenant redeploy (closes #660 ) Replaces the manual 4-step runbook in `reference_manual_ecr_promote_procedure.md` with a single self-contained script + 40 mock-driven e2e tests + a CI gate. ## What's in this change ### `scripts/promote-tenant-image.sh` The script does the full chain end-to-end: 1. PREFLIGHT — AWS auth ok, source-tag exists, CP base reachable. Exits 1 with no mutations if anything's wrong. 2. SNAPSHOT — saves the current dest-tag manifest as `<dest>-prev-YYYYMMDD`. Idempotent: same UTC day re-runs are no-ops. 3. PROMOTE — copies `<source-tag>` manifest → `<dest-tag>` via `aws ecr put-image` with the OCI image-index media type (preserves inner child-manifest digest per `reference_ecr_cross_account_digest_exact_mirror`). 4. REDEPLOY — per-tenant POST `/cp/admin/tenants/<slug>/redeploy`. On HTTP 403 (stale tenant docker ECR auth — `feedback_ec2_ecr_auth_12h_stale`) it SSM-refreshes the EC2's docker login and retries once. 5. VERIFY — per-tenant `/buildinfo` + `/health` probes. Failure here triggers auto-rollback. 6. ROLLBACK (on failure) — re-promotes the rollback tag back to `<dest-tag>` and redeploys the fleet. Exits 3 if rollback OK, 4 if not. Every external call (aws/curl/ssm) is wrapped in a function with a `--mock-dir` injection point so the tests can drive every branch without touching real infrastructure. ### `scripts/test-promote-tenant-image.sh` 40 cases across 11 test groups: - happy path (5 assertions on call counts + exit code) - preflight failures with no mutations - snapshot idempotency - `--dry-run` skips all mutations - 403 → SSM-refresh → retry path - redeploy fail with vs without rollback (exit 3 vs 4) - argument validation (missing/conflicting/unknown flags) - date override for rollback tag naming - empty source manifest detection - verify-failure triggers rollback Runs `bash scripts/test-promote-tenant-image.sh`. No live infra touched. ### `.gitea/workflows/ci.yml` Two new steps in the existing `Shellcheck (E2E scripts)` job (a required check on `main`), gated by the existing `scripts` change filter (`scripts/`, `tests/e2e/`, `infra/scripts/`, or this workflow file itself): 1. Run `scripts/test-promote-tenant-image.sh` — fails CI if any of the 40 cases regresses. 2. Run `shellcheck --severity=warning` on the two files. The bulk shellcheck step intentionally excludes `scripts/` for legacy SC3040/SC3043 reasons; explicit invocation here catches new regressions in the promote script without unblocking the bulk cleanup. ## Validated locally ``` $ bash scripts/test-promote-tenant-image.sh ... All 40 tests passed. $ shellcheck --severity=warning scripts/promote-tenant-image.sh scripts/test-promote-tenant-image.sh (clean) ``` ## Closes - core#660 — "Codify manual ECR promote operation as `scripts/promote-tenant-image.sh`" (tier:medium, core-devops) ## Cross-links - core#658 — proper fix for the 12h-stale tenant ECR auth (this script ships the SSM-refresh workaround pending the credential-helper rollout). - `reference_manual_ecr_promote_procedure.md` (memory) — the manual procedure this script replaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-12 07:28:34 +00:00
..
demo-freeze-snapshots	ops: demo-day freeze + rollback runbook	2026-05-01 12:04:30 -07:00
ops	fix(ci): extend class-E rename to scripts/ops/sweep-*.sh (chained-defect from #430 review)	2026-05-11 01:32:26 -07:00
build_runtime_package.py	fix(ci): add _sanitize_a2a to TOP_LEVEL_MODULES allowlist (third workflow defect)	2026-05-10 19:32:58 -07:00
build-images.sh
bundle-compile.sh
check-cascade-list-vs-manifest.sh	feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3)	2026-05-03 03:52:39 -07:00
check-stale-promote-pr.sh	fix(ci): replace gh pr CLI with Gitea v1 REST in workflows + scripts (#75 class A)	2026-05-07 15:29:26 -07:00
cleanup-rogue-workspaces.sh
clone-manifest.sh	fix(ci): strip JSON5 comments from manifest.json before jq parse	2026-05-11 22:02:02 +00:00
demo-day-runbook.md	ops: demo-day freeze + rollback runbook	2026-05-01 12:04:30 -07:00
demo-freeze.sh	fix(scripts): migrate ghcr.io→ECR + raw.githubusercontent.com→Gitea (#46 )	2026-05-07 00:56:23 -07:00
demo-thaw.sh	ops: demo-day freeze + rollback runbook	2026-05-01 12:04:30 -07:00
dev-start.sh
edge-429-probe.sh	chore(observability): edge-429 probe + ratelimit observability runbook	2026-05-07 15:48:34 -07:00
import-agent.sh
lockdown-tenant-sg.sh
measure-coordinator-task-bounds-runner.sh
measure-coordinator-task-bounds.sh
nuke-and-rebuild.sh	tech-debt: rename molecule-monorepo-net -> molecule-core-net	2026-05-09 20:51:48 +00:00
post-rebuild-setup.sh
promote-tenant-image.sh	feat(scripts): codify ECR :staging-latest → :latest promote + tenant redeploy (closes #660 )	2026-05-12 07:28:34 +00:00
README.md	refactor(ci): drop "canary-" prefix → staging-smoke/staging-verify (Hongming directive 2026-05-11) (#443 )	2026-05-11 11:25:29 +00:00
refresh-workspace-images.sh	fix(scripts): migrate ghcr.io→ECR + raw.githubusercontent.com→Gitea (#46 )	2026-05-07 00:56:23 -07:00
rollback-latest.sh	fix(scripts): migrate ghcr.io→ECR + raw.githubusercontent.com→Gitea (#46 )	2026-05-07 00:56:23 -07:00
staging-smoke.sh	refactor(ci): drop "canary-" prefix → staging-smoke/staging-verify (Hongming directive 2026-05-11) (#443 )	2026-05-11 11:25:29 +00:00
test_build_runtime_package.py	chore: rewriter unit tests + drop misleading noqa on `import inbox`	2026-04-30 20:45:32 -07:00
test-a2a-cross-runtime.sh
test-all-adapters.sh
test-all-runtimes-a2a-e2e.sh	test(e2e): wire SaaS auth headers (TENANT_ADMIN_TOKEN + TENANT_ORG_ID)	2026-05-02 04:36:23 -07:00
test-all.sh
test-check-stale-promote-pr.sh	feat(ops): hourly alarm for auto-promote PR stuck on REVIEW_REQUIRED (#2975 )	2026-05-05 17:55:27 -07:00
test-cross-agent-chat.sh
test-hermes-plugin-e2e.sh	test(e2e): unified A2A round-trip parity harness across all 4 runtimes	2026-05-02 04:36:23 -07:00
test-nuke-and-rebuild.sh
test-promote-tenant-image.sh	feat(scripts): codify ECR :staging-latest → :latest promote + tenant redeploy (closes #660 )	2026-05-12 07:28:34 +00:00
test-team-e2e.sh
wheel_smoke.py	feat(mcp): notifications/claude/channel for push-feel inbox UX	2026-04-30 20:10:01 -07:00

README.md

scripts/

Operational and one-off scripts for molecule-core. Most are self-documenting — see the header comments in each file.

RFC #2251 coordinator task-bound harnesses

There are three related scripts; pick the right one:

Script	Purpose	Targets
`measure-coordinator-task-bounds.sh`	Canonical v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via `claude-code-default` + `langgraph` templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + activity trace.	OSS-shape platform — localhost or any `/workspaces`-shaped endpoint. Has tenant/admin-token guards for non-localhost runs.
`measure-coordinator-task-bounds-runner.sh`	Generalised runner for the same measurement contract but with arbitrary template + secret + model combinations (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness.	Same as above (local or SaaS via `MODE=saas`).
`measure-coordinator-task-bounds.sh` (in molecule-controlplane)	Production-shape variant that bootstraps a real staging tenant via `POST /cp/admin/orgs`, then runs the same measurement against `<slug>.staging.moleculesai.app`.	Staging controlplane only — refuses to run against production.

See reference_harness_pair_pattern (auto-memory) for when to use which and the cross-repo design rationale.

Common safety pattern across all three

Cleanup trap on EXIT/INT/TERM auto-deletes provisioned resources.
DRY_RUN=1 prints plan + auth fingerprint, exits before any state mutation. Run this before pointing at staging or any shared infrastructure.
Non-target guard refuses arbitrary endpoints (the controlplane variant is locked to staging-api.moleculesai.app; the OSS variant requires explicit auth + tenant scoping for non-localhost PLATFORM).
Cleanup failures emit cleanup_*_failed events with remediation hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a structured event rather than a silent leak.

Activity trace caveat

If activity_trace.raw == "<endpoint_unavailable>", the per-workspace /activity endpoint isn't wired on the target build — the bound measurement is INCONCLUSIVE on the platform-ceiling question. Either wire the endpoint or replace with the equivalent Datadog query. Note that /activity accepts a since_secs query parameter; see the endpoint handler for the supported range.

Other scripts

cleanup-rogue-workspaces.sh — emergency teardown for leaked workspaces. Prompts for confirmation. Pair with the harnesses if a cleanup trap fails (see cleanup_*_failed events).
staging-smoke.sh — quick smoke test for the staging canary fleet (formerly canary-smoke.sh).
dev-start.sh — local-dev platform bring-up.

The rest are self-documenting in their header comments.