|
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
CI / Detect changes (pull_request) Successful in 25s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 24s
E2E API Smoke Test / detect-changes (pull_request) Successful in 28s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 29s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 14s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 27s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7
qa-review / approved (pull_request) Failing after 16s
sop-checklist-gate / gate (pull_request) Successful in 16s
security-review / approved (pull_request) Failing after 17s
gate-check-v3 / gate-check (pull_request) Failing after 26s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 42s
sop-tier-check / tier-check (pull_request) Successful in 34s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m21s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m28s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m44s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 37s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 25s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 8m40s
CI / Canvas (Next.js) (pull_request) Successful in 17m44s
CI / Platform (Go) (pull_request) Failing after 19m16s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Failing after 7s
Replaces the manual 4-step runbook in `reference_manual_ecr_promote_procedure.md` with a single self-contained script + 40 mock-driven e2e tests + a CI gate. ## What's in this change ### `scripts/promote-tenant-image.sh` The script does the full chain end-to-end: 1. **PREFLIGHT** — AWS auth ok, source-tag exists, CP base reachable. Exits 1 with no mutations if anything's wrong. 2. **SNAPSHOT** — saves the current dest-tag manifest as `<dest>-prev-YYYYMMDD`. Idempotent: same UTC day re-runs are no-ops. 3. **PROMOTE** — copies `<source-tag>` manifest → `<dest-tag>` via `aws ecr put-image` with the OCI image-index media type (preserves inner child-manifest digest per `reference_ecr_cross_account_digest_exact_mirror`). 4. **REDEPLOY** — per-tenant POST `/cp/admin/tenants/<slug>/redeploy`. On HTTP 403 (stale tenant docker ECR auth — `feedback_ec2_ecr_auth_12h_stale`) it SSM-refreshes the EC2's docker login and retries once. 5. **VERIFY** — per-tenant `/buildinfo` + `/health` probes. Failure here triggers auto-rollback. 6. **ROLLBACK** (on failure) — re-promotes the rollback tag back to `<dest-tag>` and redeploys the fleet. Exits 3 if rollback OK, 4 if not. Every external call (aws/curl/ssm) is wrapped in a function with a `--mock-dir` injection point so the tests can drive every branch without touching real infrastructure. ### `scripts/test-promote-tenant-image.sh` 40 cases across 11 test groups: - happy path (5 assertions on call counts + exit code) - preflight failures with no mutations - snapshot idempotency - `--dry-run` skips all mutations - 403 → SSM-refresh → retry path - redeploy fail with vs without rollback (exit 3 vs 4) - argument validation (missing/conflicting/unknown flags) - date override for rollback tag naming - empty source manifest detection - verify-failure triggers rollback Runs `bash scripts/test-promote-tenant-image.sh`. No live infra touched. ### `.gitea/workflows/ci.yml` Two new steps in the existing `Shellcheck (E2E scripts)` job (a required check on `main`), gated by the existing `scripts` change filter (`scripts/`, `tests/e2e/`, `infra/scripts/`, or this workflow file itself): 1. Run `scripts/test-promote-tenant-image.sh` — fails CI if any of the 40 cases regresses. 2. Run `shellcheck --severity=warning` on the two files. The bulk shellcheck step intentionally excludes `scripts/` for legacy SC3040/SC3043 reasons; explicit invocation here catches new regressions in the promote script without unblocking the bulk cleanup. ## Validated locally ``` $ bash scripts/test-promote-tenant-image.sh ... All 40 tests passed. $ shellcheck --severity=warning scripts/promote-tenant-image.sh scripts/test-promote-tenant-image.sh (clean) ``` ## Closes - core#660 — "Codify manual ECR promote operation as `scripts/promote-tenant-image.sh`" (tier:medium, core-devops) ## Cross-links - core#658 — proper fix for the 12h-stale tenant ECR auth (this script ships the SSM-refresh workaround pending the credential-helper rollout). - `reference_manual_ecr_promote_procedure.md` (memory) — the manual procedure this script replaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| demo-freeze-snapshots | ||
| ops | ||
| build_runtime_package.py | ||
| build-images.sh | ||
| bundle-compile.sh | ||
| check-cascade-list-vs-manifest.sh | ||
| check-stale-promote-pr.sh | ||
| cleanup-rogue-workspaces.sh | ||
| clone-manifest.sh | ||
| demo-day-runbook.md | ||
| demo-freeze.sh | ||
| demo-thaw.sh | ||
| dev-start.sh | ||
| edge-429-probe.sh | ||
| import-agent.sh | ||
| lockdown-tenant-sg.sh | ||
| measure-coordinator-task-bounds-runner.sh | ||
| measure-coordinator-task-bounds.sh | ||
| nuke-and-rebuild.sh | ||
| post-rebuild-setup.sh | ||
| promote-tenant-image.sh | ||
| README.md | ||
| refresh-workspace-images.sh | ||
| rollback-latest.sh | ||
| staging-smoke.sh | ||
| test_build_runtime_package.py | ||
| test-a2a-cross-runtime.sh | ||
| test-all-adapters.sh | ||
| test-all-runtimes-a2a-e2e.sh | ||
| test-all.sh | ||
| test-check-stale-promote-pr.sh | ||
| test-cross-agent-chat.sh | ||
| test-hermes-plugin-e2e.sh | ||
| test-nuke-and-rebuild.sh | ||
| test-promote-tenant-image.sh | ||
| test-team-e2e.sh | ||
| wheel_smoke.py | ||
scripts/
Operational and one-off scripts for molecule-core. Most are self-documenting — see the header comments in each file.
RFC #2251 coordinator task-bound harnesses
There are three related scripts; pick the right one:
| Script | Purpose | Targets |
|---|---|---|
measure-coordinator-task-bounds.sh |
Canonical v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via claude-code-default + langgraph templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + activity trace. |
OSS-shape platform — localhost or any /workspaces-shaped endpoint. Has tenant/admin-token guards for non-localhost runs. |
measure-coordinator-task-bounds-runner.sh |
Generalised runner for the same measurement contract but with arbitrary template + secret + model combinations (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. | Same as above (local or SaaS via MODE=saas). |
measure-coordinator-task-bounds.sh (in molecule-controlplane) |
Production-shape variant that bootstraps a real staging tenant via POST /cp/admin/orgs, then runs the same measurement against <slug>.staging.moleculesai.app. |
Staging controlplane only — refuses to run against production. |
See reference_harness_pair_pattern (auto-memory) for when to use which
and the cross-repo design rationale.
Common safety pattern across all three
- Cleanup trap on EXIT/INT/TERM auto-deletes provisioned resources.
DRY_RUN=1prints plan + auth fingerprint, exits before any state mutation. Run this before pointing at staging or any shared infrastructure.- Non-target guard refuses arbitrary endpoints (the controlplane
variant is locked to
staging-api.moleculesai.app; the OSS variant requires explicit auth + tenant scoping for non-localhost PLATFORM). - Cleanup failures emit
cleanup_*_failedevents with remediation hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a structured event rather than a silent leak.
Activity trace caveat
If activity_trace.raw == "<endpoint_unavailable>", the per-workspace
/activity endpoint isn't wired on the target build — the bound
measurement is INCONCLUSIVE on the platform-ceiling question. Either
wire the endpoint or replace with the equivalent Datadog query. Note
that /activity accepts a since_secs query parameter; see the
endpoint handler for the supported range.
Other scripts
cleanup-rogue-workspaces.sh— emergency teardown for leaked workspaces. Prompts for confirmation. Pair with the harnesses if a cleanup trap fails (seecleanup_*_failedevents).staging-smoke.sh— quick smoke test for the staging canary fleet (formerlycanary-smoke.sh).dev-start.sh— local-dev platform bring-up.
The rest are self-documenting in their header comments.