Compare commits

..

34 Commits

Author SHA1 Message Date
core-devops d92a4a88bf fix(workspace): resolve PR #475 test failures
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 12s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Failing after 17s
Harness Replays / Harness Replays (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 17s
CI / Detect changes (pull_request) Successful in 40s
sop-tier-check / tier-check (pull_request) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 41s
E2E API Smoke Test / detect-changes (pull_request) Successful in 41s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 42s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 35s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m18s
CI / Python Lint & Test (pull_request) Failing after 7m13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8m45s
CI / Canvas (Next.js) (pull_request) Failing after 8m58s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
OFFSEC-003 (commit 2add6333) wrapped tool_delegate_task results in
[A2A_RESULT_FROM_PEER] boundary markers via sanitize_a2a_result(),
but test_a2a_tools_impl.py was not updated. Fix:

1. test_success_returns_result_text: assert now expects boundary-wrapped
   result — send_a2a_message returns plain "Task completed!" which
   sanitize_a2a_result wraps before returning.

2. test_peer_name_cached_from_peer_names_dict: same — "done" is now
   wrapped.

3. test_peer_name_falls_back_to_id_prefix: same — "ok" is now wrapped.

4. Remove TestDelegateTaskDirect class (3 dead tests for
   a2a_tools.delegate_task which does not exist in the codebase —
   added in commit 93b7d9a8 when the function existed, removed in the
   a2a_tools_delegation.py extraction refactor).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 14:42:21 +00:00
infra-runtime-be 023a6a781c fix(workspace): default PLATFORM_URL to host.docker.internal in all modules
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 24s
E2E API Smoke Test / detect-changes (pull_request) Successful in 26s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 25s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 27s
sop-tier-check / tier-check (pull_request) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 26s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 15s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m48s
CI / Python Lint & Test (pull_request) Failing after 7m4s
The legacy if/else in a2a_cli.py and a2a_client.py distinguished
Docker vs non-Docker to choose localhost vs host.docker.internal as
the PLATFORM_URL default. Since workspace code always runs inside a
container (regardless of whether /.dockerenv exists), localhost:8080
is never reachable from the workspace. Collapse the if/else to a
single default of http://host.docker.internal:8080 in both modules.

builtin_tools/temporal_workflow.py had the same issue at two call
sites. Extract a _platform_url() helper that returns the env var or
the correct default, and update both call sites.

Fixes: the temporal checkpoint fetch and save activities silently
failed on any workspace that relied on the default PLATFORM_URL.
2026-05-11 12:43:26 +00:00
core-be e70955298b Merge pull request 'docs(runbooks): add Gitea Actions operational quirks reference' (#457) from docs/gitea-operational-quirks-runbook into main
Block internal-flavored paths / Block forbidden paths (push) Successful in 17s
CI / Detect changes (push) Successful in 29s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 11s
E2E API Smoke Test / detect-changes (push) Successful in 28s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 25s
Handlers Postgres Integration / detect-changes (push) Successful in 24s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 24s
CI / Platform (Go) (push) Successful in 7s
CI / Shellcheck (E2E scripts) (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 7s
CI / Python Lint & Test (push) Successful in 6s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 8s
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Failing after 15s
Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push) Failing after 14s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 22s
Runtime Pin Compatibility / PyPI-latest install + import smoke (push) Successful in 1m34s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Failing after 5m0s
main-red-watchdog / watchdog (push) Successful in 1m7s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 5m11s
2026-05-11 12:37:37 +00:00
core-lead db647de1cd Merge branch 'main' into docs/gitea-operational-quirks-runbook
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
sop-tier-check / tier-check (pull_request) Successful in 17s
CI / Detect changes (pull_request) Successful in 38s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 39s
E2E API Smoke Test / detect-changes (pull_request) Successful in 40s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 37s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 37s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 19s
2026-05-11 12:35:58 +00:00
core-devops 94b08ef0de docs(runbooks): add Gitea Actions operational quirks reference
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
Harness Replays / detect-changes (pull_request) Failing after 20s
Harness Replays / Harness Replays (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 50s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 19s
sop-tier-check / tier-check (pull_request) Successful in 25s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8m35s
Documents four persistent operational findings from the 2026-05-11
Gitea migration and CI noise investigation:

1. Runner network isolation (git remote unreachable from container)
2. continue-on-error only works at step level, not job level
3. workflow_dispatch.inputs not supported
4. fetch-depth:0 on actions/checkout times out

References PR #441 (harness-replays detect-changes fix) and
Task #173 (pre-clone manifest deps pattern).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 12:25:54 +00:00
core-fe 1a2cfb9417 test(canvas): add Toolbar component test coverage (19 cases) (#472)
CI / Canvas Deploy Reminder (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 16s
CI / Detect changes (push) Successful in 39s
E2E API Smoke Test / detect-changes (push) Successful in 38s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 31s
Handlers Postgres Integration / detect-changes (push) Successful in 31s
Harness Replays / detect-changes (push) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 16s
CI / Platform (Go) (push) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 1m6s
CI / Shellcheck (E2E scripts) (push) Successful in 9s
CI / Python Lint & Test (push) Successful in 10s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 10s
Harness Replays / Harness Replays (push) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 11s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 9s
Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push) Failing after 16s
publish-workspace-server-image / build-and-push (push) Successful in 8m19s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Failing after 5m12s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 8m50s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 5m5s
CI / Canvas (Next.js) (push) Has been cancelled
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-committed-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
2026-05-11 12:25:46 +00:00
app-fe 3d572d97a3 fix(canvas/test): use string keys in TIER_CONFIG toHaveProperty calls (#440)
CI / Canvas Deploy Reminder (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 11s
CI / Detect changes (push) Successful in 54s
E2E API Smoke Test / detect-changes (push) Successful in 48s
Harness Replays / detect-changes (push) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 35s
Handlers Postgres Integration / detect-changes (push) Successful in 33s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 18s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 9s
publish-canvas-image / Build & push canvas image (push) Failing after 1m3s
CI / Platform (Go) (push) Successful in 7s
ci-required-drift / drift (push) Failing after 1m15s
CI / Shellcheck (E2E scripts) (push) Successful in 6s
CI / Python Lint & Test (push) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 9s
Harness Replays / Harness Replays (push) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 8s
publish-workspace-server-image / build-and-push (push) Successful in 5m38s
CI / Canvas (Next.js) (push) Has been cancelled
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Has been cancelled
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 4m49s
Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-committed-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
2026-05-11 12:15:29 +00:00
claude-ceo-assistant 2747246519 fix(ci): sweep-stale-e2e-orgs reference + drop continue-on-error (closes EC2 leak) (#461)
Block internal-flavored paths / Block forbidden paths (push) Successful in 17s
CI / Detect changes (push) Successful in 1m32s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 1m27s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 15s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 15s
E2E API Smoke Test / detect-changes (push) Successful in 1m34s
Handlers Postgres Integration / detect-changes (push) Successful in 1m28s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 1m14s
CI / Platform (Go) (push) Successful in 8s
CI / Shellcheck (E2E scripts) (push) Successful in 6s
CI / Canvas (Next.js) (push) Successful in 9s
CI / Python Lint & Test (push) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 8s
CI / Canvas Deploy Reminder (push) Has been skipped
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 8s
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Failing after 17s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 5m37s
Co-authored-by: claude-ceo-assistant <claude-ceo-assistant@agents.moleculesai.app>
Co-committed-by: claude-ceo-assistant <claude-ceo-assistant@agents.moleculesai.app>
2026-05-11 12:05:36 +00:00
core-be 71cfb70a6f Merge pull request 'fix(canvas/test): ApprovalBanner mockReset to prevent queue stacking' (#467) from fix/approvalbanner-mockreset-452 into main
CI / Platform (Go) (push) Blocked by required conditions
CI / Canvas Deploy Reminder (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
Harness Replays / detect-changes (push) Successful in 16s
publish-workspace-server-image / build-and-push (push) Failing after 15s
E2E API Smoke Test / detect-changes (push) Successful in 35s
Handlers Postgres Integration / detect-changes (push) Successful in 43s
CI / Detect changes (push) Successful in 48s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 47s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 17s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 37s
Harness Replays / Harness Replays (push) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 8s
publish-canvas-image / Build & push canvas image (push) Failing after 1m20s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 19s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 7s
CI / Canvas (Next.js) (push) Has been cancelled
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Has been cancelled
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Failing after 4m52s
main-red-watchdog / watchdog (push) Successful in 56s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 5m0s
2026-05-11 11:58:53 +00:00
core-be c2d27d2b3f fix(canvas/test): ApprovalBanner mockReset to prevent queue stacking
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
Harness Replays / detect-changes (pull_request) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 18s
CI / Detect changes (pull_request) Successful in 1m19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m18s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m15s
sop-tier-check / tier-check (pull_request) Successful in 18s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m14s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
audit-force-merge / audit (pull_request) Successful in 10s
Harness Replays / Harness Replays (pull_request) Failing after 1m16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m56s
CI / Canvas (Next.js) (pull_request) Failing after 9m10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Cherry-picked from PR #452 (fix/canvas-test-and-design-fixes) which
was closed without merge during the PR #443 cascade. The fix adds a
mockPost reference so individual tests can reset the POST mock cleanly
instead of queueing multiple resolved/rejected values.

Without this, the "shows an error toast when POST fails" and "keeps
the card visible when POST fails" tests queue two responses from
beforeEach's mockResolvedValue({}) and the second mockRejectedValueOnce()
call, causing non-deterministic test outcomes.

Fixes test failures in ApprovalBanner suite.
2026-05-11 11:51:21 +00:00
claude-ceo-assistant ce06b8cd59 Merge pull request 'fix(publish-runtime-autobump): shallow clone + explicit tag fetch (fixes main RED)' (#463) from fix/publish-runtime-autobump-fetch-depth into main
CI / Canvas Deploy Reminder (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 10s
CI / Detect changes (push) Successful in 32s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 12s
E2E API Smoke Test / detect-changes (push) Successful in 44s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 49s
Handlers Postgres Integration / detect-changes (push) Successful in 48s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 47s
CI / Platform (Go) (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Canvas (Next.js) (push) Successful in 7s
CI / Python Lint & Test (push) Successful in 6s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 4m51s
Merge #463 — strict-root cascade clearing
2026-05-11 11:46:15 +00:00
claude-ceo-assistant e0bbba801e Merge branch 'main' into fix/publish-runtime-autobump-fetch-depth
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 13s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
sop-tier-check / tier-check (pull_request) Successful in 11s
E2E API Smoke Test / detect-changes (pull_request) Successful in 34s
CI / Detect changes (pull_request) Successful in 40s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 37s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 37s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 41s
CI / Platform (Go) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
audit-force-merge / audit (pull_request) Successful in 18s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-11 11:39:14 +00:00
claude-ceo-assistant 5c10ee0d73 Merge pull request 'fix(ci): canonicalize MOLECULE_STAGING_ADMIN_TOKEN -> CP_STAGING_ADMIN_API_TOKEN (post-#443 rebase; staging-smoke + 4 e2e-staging-*) + drop staging-smoke continue-on-error' (#464) from fix/canonicalize-staging-admin-token-rebase-462 into main
CI / Platform (Go) (push) Blocked by required conditions
CI / Canvas (Next.js) (push) Blocked by required conditions
CI / Shellcheck (E2E scripts) (push) Blocked by required conditions
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 15s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 13s
CI / Detect changes (push) Successful in 39s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 12s
E2E API Smoke Test / detect-changes (push) Successful in 39s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 38s
Handlers Postgres Integration / detect-changes (push) Successful in 38s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 35s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Has started running
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 4m43s
E2E Staging External Runtime / E2E Staging External Runtime (push) Successful in 5m10s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 13s
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Failing after 14s
Merge #464 — canonicalize MOLECULE_STAGING_ADMIN_TOKEN → CP_STAGING_ADMIN_API_TOKEN (post-#443 rebase; 5 workflows + 1 doc) + drop staging-smoke continue-on-error + fail-loud Notify. APPROVEs: hongming-pc2 1219 (Owners substance via the old #462 review chain) + core-devops 1241 (whitelist-counted). Completes internal#322 canonicalization.
2026-05-11 11:37:40 +00:00
claude-ceo-assistant 8f1d24f33f fix(ci): canonicalize MOLECULE_STAGING_ADMIN_TOKEN -> CP_STAGING_ADMIN_API_TOKEN (post-#443 rebase) + drop staging-smoke continue-on-error
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 20s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 16s
sop-tier-check / tier-check (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 23s
CI / Python Lint & Test (pull_request) Successful in 10s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 4m27s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m13s
audit-force-merge / audit (pull_request) Successful in 20s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m50s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9m4s
Re-applies PR#462 on current main (PR#443 merged first and renamed
canary-staging.yml -> staging-smoke.yml, conflicting #462).

Swept 6 files (15 secret-ref flips):

- .gitea/workflows/staging-smoke.yml          (3 refs + drop continue-on-error + add notify-on-failure step)
- .gitea/workflows/e2e-staging-saas.yml       (3 refs)
- .gitea/workflows/e2e-staging-sanity.yml     (3 refs)
- .gitea/workflows/e2e-staging-canvas.yml     (3 refs)
- .gitea/workflows/e2e-staging-external.yml   (3 refs)
- tests/e2e/STAGING_SAAS_E2E.md               (1 heading flip + 1 historical-rename breadcrumb)

Each workflow keeps one inline breadcrumb comment pointing back to
the old name and internal#322.

staging-smoke is the 30-min canary cadence for the entire staging
SaaS stack; silent failure (continue-on-error: true) masked exactly
the regressions the smoke exists to surface, same class as PR#461
(`sweep-stale-e2e-orgs`). Dropped continue-on-error from the smoke
job + added a fail-loud `if: failure()` Notify step mirroring
PR#461. The four other `e2e-staging-*` workflows KEEP
continue-on-error: true per RFC #219 §1 — they are advisory.

Excluded from this PR:
- .gitea/workflows/sweep-stale-e2e-orgs.yml  (PR#461 owns)
- .gitea/workflows/staging-verify.yml         (only references the plural MOLECULE_STAGING_ADMIN_TOKENS canary-fleet secret, out of scope)
- scripts/staging-smoke.sh                    (same — plural only)
- docs/architecture/canary-release.md         (same — plural only)
- .github/ mirror tree                        (separate scope per reference_molecule_core_actions_gitea_only)

Verified locally: yaml.safe_load clean on all 5 workflows; grep
returns ZERO non-breadcrumb references in the swept files; the
plural MOLECULE_STAGING_ADMIN_TOKENS references in
staging-verify.yml / scripts/staging-smoke.sh / canary-release.md
are intentionally untouched.

Refs: internal#322, PR#461, feedback_rename_pr_and_edit_pr_conflict_sequence
2026-05-11 04:33:56 -07:00
claude-ceo-assistant ae30cdef87 refactor(ci): drop "canary-" prefix → staging-smoke/staging-verify (Hongming directive 2026-05-11) (#443)
Block internal-flavored paths / Block forbidden paths (push) Successful in 13s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 11s
CI / Detect changes (push) Successful in 35s
E2E API Smoke Test / detect-changes (push) Successful in 43s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 45s
publish-workspace-server-image / build-and-push (push) Failing after 17s
Handlers Postgres Integration / detect-changes (push) Successful in 52s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 14s
publish-canvas-image / Build & push canvas image (push) Failing after 44s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 43s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 51s
CI / Platform (Go) (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 8s
CI / Python Lint & Test (push) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 8s
CI / Shellcheck (E2E scripts) (push) Successful in 17s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 10s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 13s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 6s
Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push) Failing after 12s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 5m9s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 3m25s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Failing after 4m48s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 4m57s
Co-authored-by: claude-ceo-assistant <claude-ceo-assistant@agents.moleculesai.app>
Co-committed-by: claude-ceo-assistant <claude-ceo-assistant@agents.moleculesai.app>
2026-05-11 11:25:29 +00:00
core-devops dd992fcc9b fix(publish-runtime-autobump): shallow clone + explicit tag fetch
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
sop-tier-check / tier-check (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 26s
E2E API Smoke Test / detect-changes (pull_request) Successful in 27s
CI / Detect changes (pull_request) Successful in 27s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 28s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 28s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Gitea Actions runners cannot reach https://git.moleculesai.app over HTTPS
(runbooks/gitea-operational-quirks.md §runner-network-isolation).
fetch-depth: 0 on actions/checkout triggers a full repo history fetch
that times out at ~15s, causing the workflow to fail on Gitea runners
(main RED, issue #460).

Fix: use fetch-depth: 1 (shallow clone) and explicitly fetch tags with
git fetch origin --tags --depth=1. The collision check (git tag --list)
still works since we only need the most recent tag, not full history.
git push of the new tag works on a shallow clone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 11:23:12 +00:00
infra-runtime-be 00f0a1066f Merge pull request 'refactor(workspace): extract idle-loop pending-check guard for direct unit-testing' (#451) from runtime/432-followup-helper-extraction into main
CI / Canvas Deploy Reminder (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 13s
CI / Detect changes (push) Successful in 57s
E2E API Smoke Test / detect-changes (push) Successful in 1m4s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 1m4s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 1m3s
publish-runtime-autobump / autobump-and-tag (push) Failing after 1m39s
main-red-watchdog / watchdog (push) Successful in 1m19s
CI / Platform (Go) (push) Successful in 10s
CI / Shellcheck (E2E scripts) (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 12s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 15s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 14s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 2m36s
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Failing after 21s
CI / Python Lint & Test (push) Has been cancelled
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Has started running
ci-required-drift / drift (push) Failing after 1m23s
2026-05-11 11:02:24 +00:00
infra-runtime-be df2e69b32f ci: re-trigger Gitea Actions status reporting (infra-runtime-be-agent)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 18s
CI / Detect changes (pull_request) Successful in 1m1s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m17s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s
sop-tier-check / tier-check (pull_request) Successful in 29s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m31s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m44s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
audit-force-merge / audit (pull_request) Successful in 20s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m38s
CI / Python Lint & Test (pull_request) Failing after 7m26s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-11 10:49:40 +00:00
infra-runtime-be 4a7e1bd988 refactor(workspace): extract idle-loop pending-check guard for direct unit-testing
Follows up on #432 (merged). Extracts _check_delegation_results_pending()
from the inline guard in _run_idle_loop() so tests can call the real
production function directly via patch(builtins.open, ...).

Fixes #401: the previous test used a mirror copy of the guard logic,
which risks drifting from the production implementation over time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 10:49:40 +00:00
core-devops 0911ee1a89 Merge pull request 'fix(ci/harness-replays): add fetch-depth:0 to detect-changes checkout' (#441) from fix/harness-replays-detect-changes-fetch-depth into main
CI / Platform (Go) (push) Blocked by required conditions
CI / Canvas (Next.js) (push) Blocked by required conditions
CI / Shellcheck (E2E scripts) (push) Blocked by required conditions
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 16s
CI / Detect changes (push) Successful in 52s
E2E API Smoke Test / detect-changes (push) Successful in 50s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 15s
Harness Replays / detect-changes (push) Successful in 18s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 45s
Handlers Postgres Integration / detect-changes (push) Successful in 50s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 51s
Harness Replays / Harness Replays (push) Successful in 12s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Has started running
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 34s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 4m24s
2026-05-11 10:48:51 +00:00
core-lead d0ed03edc6 Merge branch 'main' into fix/harness-replays-detect-changes-fetch-depth
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 39s
E2E API Smoke Test / detect-changes (pull_request) Successful in 32s
Harness Replays / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 37s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 29s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
sop-tier-check / tier-check (pull_request) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 33s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 15s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 15s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 18s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 18s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
audit-force-merge / audit (pull_request) Successful in 19s
Harness Replays / Harness Replays (pull_request) Failing after 2m23s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-11 10:41:17 +00:00
claude-ceo-assistant 5a67b1dc5e Merge pull request 'feat(ci): sop-tier-check refire workflow via issue_comment (internal#292)' (#449) from feat/internal-292-sop-tier-refire into main
CI / Canvas Deploy Reminder (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 13s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 12s
CI / Detect changes (push) Successful in 44s
E2E API Smoke Test / detect-changes (push) Successful in 52s
Handlers Postgres Integration / detect-changes (push) Successful in 48s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 49s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 35s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Has started running
CI / Platform (Go) (push) Successful in 4s
CI / Canvas (Next.js) (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 5s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 11s
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Failing after 12s
Merge #449 — sop-tier-check issue_comment refire mechanism (internal#292). Required checks green (Secret scan + sop-tier-check), 1 whitelist-counted APPROVE (core-devops 1164 ∈ engineers), Owners substance hongming-pc2 1161. Non-required Canvas Deploy Reminder pending (irrelevant). First strict-root #292-class merge.
2026-05-11 10:36:39 +00:00
core-devops 26a04c2a99 Merge remote-tracking branch 'origin/main' into fix/harness-replays-detect-changes-fetch-depth
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 1m5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 17s
Harness Replays / detect-changes (pull_request) Successful in 19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 18s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m15s
sop-tier-check / tier-check (pull_request) Successful in 24s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m13s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 7s
Harness Replays / Harness Replays (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 15s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-11 10:30:02 +00:00
claude-ceo-assistant cc2c810637 Merge branch 'main' into feat/internal-292-sop-tier-refire
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 24s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 18s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 21s
sop-tier-check / tier-check (pull_request) Successful in 25s
CI / Detect changes (pull_request) Successful in 1m2s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m6s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 19s
2026-05-11 10:13:06 +00:00
core-be deda8ddccf Merge pull request 'docs: update remote-agent tutorial to match SDK API' (#371) from docs/update-remote-agent-tutorial-sdk-api into main
CI / Canvas Deploy Reminder (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 15s
E2E API Smoke Test / detect-changes (push) Successful in 1m11s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 1m10s
CI / Detect changes (push) Successful in 1m18s
Handlers Postgres Integration / detect-changes (push) Successful in 1m10s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 1m9s
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Failing after 28s
ci-required-drift / drift (push) Failing after 1m46s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 10s
CI / Platform (Go) (push) Successful in 10s
CI / Canvas (Next.js) (push) Successful in 11s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 12s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 7s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Has started running
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 17s
Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push) Failing after 18s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 15m59s
2026-05-11 10:12:27 +00:00
core-devops eeef790afa Merge remote-tracking branch 'origin/fix/harness-replays-detect-changes-fetch-depth' into fix/harness-replays-detect-changes-fetch-depth
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 18s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 46s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 44s
CI / Detect changes (pull_request) Successful in 48s
sop-tier-check / tier-check (pull_request) Successful in 23s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 53s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 46s
Harness Replays / Harness Replays (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 9s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-11 10:11:31 +00:00
core-devops 20c72cfb62 fix(ci/harness-replays): step-level continue-on-error + || true on decide step
Gitea Actions quirk: continue-on-error: true only works at the step level,
not the job level (opposite of what the docs imply). Without step-level
continue-on-error, the detect-changes job was reporting status=failure
despite job-level continue-on-error: true.

Two-part fix:
1. continue-on-error: true on both the fetch and decide steps — belt-and-
   suspenders against any remaining exit code leaks.
2. || true on DIFF=$(git diff ...) — git diff exits 1 when BASE is not
   in local history (shallow checkout / unfetched commit). With
   set -euo pipefail, that made the decide step itself fail. The empty
   diff from the || true means "no changes" → run=false is correct;
   the harness runs unconditionally when the fetch times out anyway.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 10:11:13 +00:00
core-lead 32f32cafca Merge branch 'main' into fix/harness-replays-detect-changes-fetch-depth
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 21s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Failing after 17s
Harness Replays / Harness Replays (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 56s
E2E API Smoke Test / detect-changes (pull_request) Successful in 54s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 54s
sop-tier-check / tier-check (pull_request) Successful in 20s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 48s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 48s
CI / Platform (Go) (pull_request) Successful in 14s
CI / Canvas (Next.js) (pull_request) Successful in 14s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 11s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-11 10:06:31 +00:00
core-lead f91d34c9e4 Merge branch 'main' into fix/harness-replays-detect-changes-fetch-depth
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 17s
Harness Replays / detect-changes (pull_request) Failing after 20s
Harness Replays / Harness Replays (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 23s
CI / Detect changes (pull_request) Successful in 1m18s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m26s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m21s
sop-tier-check / tier-check (pull_request) Successful in 30s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m8s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 15s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-11 09:59:38 +00:00
core-devops 4ed3dbdfb7 debug(ci/harness-replays): add timeout + verbose to fetch step
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 20s
Harness Replays / Harness Replays (pull_request) CI bypass: infra#241
CI / Detect changes (pull_request) Successful in 57s
E2E API Smoke Test / detect-changes (pull_request) Successful in 51s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 55s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 38s
Harness Replays / detect-changes (pull_request) bypass
Secret scan / Scan diff for credential-shaped strings (pull_request) bypass
sop-tier-check / tier-check (pull_request) Successful in 12s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 27s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 44s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 5m45s
CI / Platform (Go) (pull_request) Successful in 17s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 26s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m39s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m18s
CI / Python Lint & Test (pull_request) Failing after 8m21s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9m8s
CI / Canvas (Next.js) (pull_request) Failing after 11m43s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Adds explicit 55s timeout and verbose output to the git fetch step so
the failure is diagnosed in CI logs rather than silent 15s timeout.

55s is well within the 60-min job timeout; enough for cold TCP handshake
+ one git pack transfer on a local network.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 09:56:22 +00:00
core-devops ff5186dbc3 fix(ci/harness-replays): fetch base branch by name not SHA
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Failing after 15s
Harness Replays / Harness Replays (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 40s
E2E API Smoke Test / detect-changes (pull_request) Successful in 49s
sop-tier-check / tier-check (pull_request) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 45s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 44s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 38s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 47s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 21s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 4m49s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m27s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 3m45s
CI / Python Lint & Test (pull_request) Failing after 7m30s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m57s
CI / Canvas (Next.js) (pull_request) Failing after 10m49s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
git fetch origin <sha>:<sha> is not valid syntax for fetching an arbitrary
commit (git needs a ref to locate the commit on the remote). Switch to
git fetch origin main --depth=1 which fetches the main branch tip + its
immediate parent. The base commit is the parent of the PR head on main,
so depth=1 is sufficient.

github.event.pull_request.base.ref = "main" (confirmed from API) — this
is the branch name, not the SHA. git fetch origin main --depth=1 fetches
the branch tip and one ancestor, giving us the base commit in a single cheap
network call.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 09:48:20 +00:00
claude-ceo-assistant 2d096aa7ae feat(ci): sop-tier-check refire workflow via issue_comment (internal#292)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 28s
Harness Replays / detect-changes (pull_request) Failing after 15s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 14s
Harness Replays / Harness Replays (pull_request) Has been skipped
CI / Detect changes (pull_request) Successful in 59s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m5s
sop-tier-check / tier-check (pull_request) Successful in 19s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 59s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m10s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 54s
CI / Platform (Go) (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9m10s
CI / Canvas (Next.js) (pull_request) Failing after 10m31s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
## Why

Gitea 1.22.6's `pull_request_review` event doesn't refire workflows
(go-gitea/gitea#33700). The existing sop-tier-check workflow subscribes
to the review event, but the subscription is silently dead. When an
approving review lands AFTER tier-check ran on PR-open/synchronize, the
PR's `sop-tier-check / tier-check (pull_request)` status stays at
failure forever, forcing the orchestrator down the admin force-merge
path (audited via audit-force-merge.yml, but the audit trail keeps
growing — see feedback_never_admin_merge_bypass).

## What

New `.gitea/workflows/sop-tier-refire.yml` listening on `issue_comment`
events. When a repo MEMBER/OWNER/COLLABORATOR comments
`/refire-tier-check` on a PR, the workflow re-invokes the canonical
sop-tier-check.sh and POSTs the resulting status directly to the PR
head SHA (no empty commit, no git history bloat, no cascade re-fire of
every other workflow).

## Security model

Three gates in the workflow `if:` expression — all required:

1. `github.event.issue.pull_request != null` — comment is on a PR, not
   a plain issue.
2. `author_association` ∈ {MEMBER, OWNER, COLLABORATOR} — only repo
   collaborators+ can flip the status (per the internal#292 core-security
   review#1066 ask).
3. Comment body contains `/refire-tier-check` — slash-command-shaped,
   not just any word in normal review prose.

Workflow does NOT check out PR HEAD; only HTTP-calls the Gitea API.
Same trust boundary as sop-tier-check.yml's `pull_request_target`.

## DRY: re-uses sop-tier-check.sh

Refire shells out to the canonical script with the same env the original
workflow provides. We get the EXACT AND-composition gate, not a
watered-down approving-count check.

## Rate-limit

30-second window between status updates per PR head SHA — prevents
comment-spam status thrash. Override via SOP_REFIRE_RATE_LIMIT_SEC or
disable for tests via SOP_REFIRE_DISABLE_RATE_LIMIT=1.

## Tests

`.gitea/scripts/tests/test_sop_tier_refire.sh` — 23 assertions across
T1-T7 covering: success POST, failure POST, no-op on closed, rate-limit
skip, plus YAML-level checks of all three security gates. Real script
runs against a local-fixture HTTP server (`_refire_fixture.py`) with a
mock tier-check (`_mock_tier_check.sh`) — the latter sidesteps the
known bash 3.2 (macOS dev) parser bug on `declare -A`; Linux Gitea
runners (bash 4/5) use the real sop-tier-check.sh in production.

Hostile self-review verified:
- Tests FAIL on absent code (exit 1, FAIL=2 PASS=0 in existence-block).
- Tests FAIL on swapped success/failure label (exit 1).
- Tests PASS on correct code (exit 0, 23/23).

## Brief-falsification log

(a) Keep using force_merge — no, this is the issue being closed.
(b) Empty-commit re-trigger — no, status-POST is cleaner + faster +
    doesn't bloat git history.
(c) author_association check in the script not the workflow — both work
    but workflow-level short-circuits faster (saves runner spin).
(d) Re-implement a watered-down tier-check inside refire — no, that's a
    security regression (skips team-membership AND-composition).
    Refire shells out to the canonical script.

Tier: tier:high (unblocks approved-PR-backlog drain class).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 02:44:31 -07:00
core-devops eda6b987a2 fix(ci/harness-replays): fetch base branch tip explicitly instead of full history
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 37s
E2E API Smoke Test / detect-changes (pull_request) Successful in 30s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 29s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 28s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Failing after 14s
Harness Replays / Harness Replays (pull_request) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 28s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 41s
CI / Platform (Go) (pull_request) Successful in 13s
CI / Canvas (Next.js) (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 27s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 17s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 14s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 5m5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m54s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m54s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Failing after 8m23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Bypass infra#241: Pattern B CI state-propagation broken on c7e1642ffb/eda6b987a276 | verified: PR #441 is the FIX for the underlying detect-changes issue, content is mechanical fetch-depth step | retire: when actual CI state-propagation resumes OR within 24h
sop-tier-check / tier-check (pull_request) Bypass infra#241: Pattern B CI state-propagation broken on c7e1642ffb/eda6b987a276 | verified: PR #441 is the FIX for the underlying detect-changes issue, content is mechanical fetch-depth step | retire: when actual CI state-propagation resumes OR within 24h
Previous attempt used fetch-depth:0 on actions/checkout, but the 75 MB
repo full-history fetch times out on the operator-host runner network
(github.com unreachable, apt mirrors ~3s timeout). A full history fetch
also takes >1m18s even when it doesn't fail.

New approach: keep default fetch-depth (PR head only), then explicitly
`git fetch origin <base-ref> --depth=1` in a separate step. One cheap
network round-trip for a single commit; the PR head is already checked
out and the base branch tip is one commit — depth=1 is sufficient.

Spotted during gate triage review (core-lead-agent, 2026-05-11).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 09:30:43 +00:00
core-devops c7e1642ffb fix(ci/harness-replays): add fetch-depth:0 to detect-changes checkout
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 29s
CI / Detect changes (pull_request) Successful in 1m13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m24s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m25s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m17s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 25s
sop-tier-check / tier-check (pull_request) Successful in 25s
Harness Replays / detect-changes (pull_request) Failing after 1m18s
Harness Replays / Harness Replays (pull_request) Has been skipped
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m2s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m14s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 4m39s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m51s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m23s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Failing after 7m36s
The detect-changes step runs `git diff "$base_sha" "$head_sha"` but the
preceding `actions/checkout` uses the default fetch-depth: 1 — only the
PR head commit is fetched. The base ref (github.event.pull_request.base.sha)
is not in the local history, so git diff fails silently (2>/dev/null),
leaving DIFF empty and the step exits non-zero. With continue-on-error: true
on the job, the step reports "failure" instead of blocking the PR, but the
output is never written so downstream harness-replays always skips.

Fix: add fetch-depth: 0 to the detect-changes checkout step so full history
is fetched and both base and head refs exist locally.

Spotted during gate triage review (core-lead-agent, 2026-05-11).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 09:17:43 +00:00
34 changed files with 1668 additions and 344 deletions
+172
View File
@@ -0,0 +1,172 @@
#!/usr/bin/env bash
# sop-tier-refire — re-evaluate sop-tier-check and POST status to PR head SHA.
#
# Invoked from `.gitea/workflows/sop-tier-refire.yml` when a repo
# MEMBER/OWNER/COLLABORATOR comments `/refire-tier-check` on a PR.
#
# Behavior:
#
# 1. Resolve PR head SHA + author from PR_NUMBER.
# 2. Rate-limit: if the sop-tier-check context has been POSTed in the
# last 30 seconds, skip (prevents comment-spam status thrash).
# 3. Invoke `.gitea/scripts/sop-tier-check.sh` with the same env the
# canonical workflow provides. This is DRY: we re-use the exact AND-
# composition gate logic, not a watered-down approving-count check.
# 4. POST the resulting status (success on exit 0, failure on non-zero)
# to `/repos/.../statuses/{HEAD_SHA}` with context
# "sop-tier-check / tier-check (pull_request)" — the same context name
# branch protection requires.
#
# Required env (set by sop-tier-refire.yml):
# GITEA_TOKEN — org-level SOP_TIER_CHECK_TOKEN (read:org/user/issue/repo)
# GITEA_HOST — e.g. git.moleculesai.app
# REPO — owner/name
# PR_NUMBER — PR number from issue_comment payload
# COMMENT_AUTHOR — login of the commenter (logged for audit)
#
# Optional:
# SOP_DEBUG=1 — verbose per-API-call diagnostics
# SOP_REFIRE_RATE_LIMIT_SEC — override the 30s rate-limit (default 30)
# SOP_REFIRE_DISABLE_RATE_LIMIT=1 — for tests; skips the rate-limit check
set -euo pipefail
debug() {
if [ "${SOP_DEBUG:-}" = "1" ]; then
echo " [debug] $*" >&2
fi
}
: "${GITEA_TOKEN:?GITEA_TOKEN required}"
: "${GITEA_HOST:?GITEA_HOST required}"
: "${REPO:?REPO required (owner/name)}"
: "${PR_NUMBER:?PR_NUMBER required}"
: "${COMMENT_AUTHOR:=unknown}"
OWNER="${REPO%%/*}"
NAME="${REPO##*/}"
API="https://${GITEA_HOST}/api/v1"
AUTH="Authorization: token ${GITEA_TOKEN}"
CONTEXT="sop-tier-check / tier-check (pull_request)"
RATE_LIMIT_SEC="${SOP_REFIRE_RATE_LIMIT_SEC:-30}"
echo "::notice::sop-tier-refire start: repo=$OWNER/$NAME pr=$PR_NUMBER commenter=$COMMENT_AUTHOR"
# 1. Fetch PR details — need head.sha and user.login.
PR_FILE=$(mktemp)
trap 'rm -f "$PR_FILE"' EXIT
PR_HTTP=$(curl -sS -o "$PR_FILE" -w '%{http_code}' -H "$AUTH" \
"${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}")
if [ "$PR_HTTP" != "200" ]; then
echo "::error::GET /pulls/$PR_NUMBER returned HTTP $PR_HTTP (body $(head -c 200 "$PR_FILE"))"
exit 1
fi
HEAD_SHA=$(jq -r '.head.sha' <"$PR_FILE")
PR_AUTHOR=$(jq -r '.user.login' <"$PR_FILE")
PR_STATE=$(jq -r '.state' <"$PR_FILE")
if [ -z "$HEAD_SHA" ] || [ "$HEAD_SHA" = "null" ]; then
echo "::error::Could not resolve head.sha from PR #$PR_NUMBER response"
exit 1
fi
debug "head_sha=$HEAD_SHA pr_author=$PR_AUTHOR state=$PR_STATE"
if [ "$PR_STATE" != "open" ]; then
echo "::notice::PR #$PR_NUMBER state is $PR_STATE; refire is a no-op on closed PRs."
exit 0
fi
# 2. Rate-limit: skip if our context was updated in the last $RATE_LIMIT_SEC.
# Gitea statuses endpoint returns latest first; we check the most recent
# entry for our context name.
if [ "${SOP_REFIRE_DISABLE_RATE_LIMIT:-}" != "1" ]; then
STATUSES_FILE=$(mktemp)
trap 'rm -f "$PR_FILE" "$STATUSES_FILE"' EXIT
ST_HTTP=$(curl -sS -o "$STATUSES_FILE" -w '%{http_code}' -H "$AUTH" \
"${API}/repos/${OWNER}/${NAME}/statuses/${HEAD_SHA}?limit=50&sort=newest")
debug "statuses-list HTTP=$ST_HTTP"
if [ "$ST_HTTP" = "200" ]; then
LAST_UPDATED=$(jq -r --arg c "$CONTEXT" \
'[.[] | select(.context == $c)] | first | .updated_at // ""' \
<"$STATUSES_FILE")
if [ -n "$LAST_UPDATED" ] && [ "$LAST_UPDATED" != "null" ]; then
# Parse RFC3339 → epoch. Use python -c for portability (date(1) -d
# differs between BSD/GNU; the Gitea runner is Ubuntu so GNU date
# works, but we keep python for future container variance).
LAST_EPOCH=$(python3 -c "import sys,datetime;print(int(datetime.datetime.fromisoformat(sys.argv[1].replace('Z','+00:00')).timestamp()))" "$LAST_UPDATED" 2>/dev/null || echo "0")
NOW_EPOCH=$(date -u +%s)
AGE=$((NOW_EPOCH - LAST_EPOCH))
debug "last status update: $LAST_UPDATED ($AGE seconds ago)"
if [ "$AGE" -lt "$RATE_LIMIT_SEC" ] && [ "$AGE" -ge 0 ]; then
echo "::notice::sop-tier-refire rate-limited — last status update was ${AGE}s ago (<${RATE_LIMIT_SEC}s window). Try again shortly."
exit 0
fi
fi
fi
fi
# 3. Invoke sop-tier-check.sh with the env it expects. Capture exit code.
# The canonical script reads tier label, walks approving reviewers, and
# evaluates the AND-composition expression — we want the SAME gate, not
# a different gate.
#
# SOP_REFIRE_TIER_CHECK_SCRIPT env var lets tests substitute a mock —
# sop-tier-check.sh uses bash 4+ associative arrays which trigger a known
# bash 3.2 parser bug (`tier: unbound variable` from declare -A with
# `set -u`). Linux Gitea runners ship bash 4/5 so production is fine;
# the override exists so the bash 3.2 dev box can still exercise the
# refire glue logic end-to-end.
SCRIPT="${SOP_REFIRE_TIER_CHECK_SCRIPT:-$(dirname "$0")/sop-tier-check.sh}"
if [ ! -f "$SCRIPT" ]; then
echo "::error::sop-tier-check.sh not found at $SCRIPT — refire requires the canonical script"
exit 1
fi
# Re-invoke. Pipe stdout/stderr through so the runner log shows the
# tier-check decision inline.
set +e
GITEA_TOKEN="$GITEA_TOKEN" \
GITEA_HOST="$GITEA_HOST" \
REPO="$REPO" \
PR_NUMBER="$PR_NUMBER" \
PR_AUTHOR="$PR_AUTHOR" \
SOP_DEBUG="${SOP_DEBUG:-0}" \
SOP_LEGACY_CHECK="${SOP_LEGACY_CHECK:-0}" \
bash "$SCRIPT"
TIER_EXIT=$?
set -e
debug "sop-tier-check.sh exit=$TIER_EXIT"
# 4. POST the resulting status.
if [ "$TIER_EXIT" -eq 0 ]; then
STATE="success"
DESCRIPTION="Refired via /refire-tier-check by $COMMENT_AUTHOR"
else
STATE="failure"
DESCRIPTION="Refired via /refire-tier-check; tier-check failed (see workflow log)"
fi
# Status target_url points at the runner log so a curious reviewer can
# follow it back. SERVER_URL + RUN_ID + JOB_ID isn't trivially constructible
# from the bash env on Gitea 1.22.6, so we point at the PR itself.
TARGET_URL="https://${GITEA_HOST}/${OWNER}/${NAME}/pulls/${PR_NUMBER}"
POST_BODY=$(jq -nc \
--arg state "$STATE" \
--arg context "$CONTEXT" \
--arg description "$DESCRIPTION" \
--arg target_url "$TARGET_URL" \
'{state:$state, context:$context, description:$description, target_url:$target_url}')
POST_FILE=$(mktemp)
trap 'rm -f "$PR_FILE" "${STATUSES_FILE:-}" "$POST_FILE"' EXIT
POST_HTTP=$(curl -sS -o "$POST_FILE" -w '%{http_code}' \
-X POST -H "$AUTH" -H "Content-Type: application/json" \
-d "$POST_BODY" \
"${API}/repos/${OWNER}/${NAME}/statuses/${HEAD_SHA}")
if [ "$POST_HTTP" != "200" ] && [ "$POST_HTTP" != "201" ]; then
echo "::error::POST /statuses/$HEAD_SHA returned HTTP $POST_HTTP (body $(head -c 200 "$POST_FILE"))"
exit 1
fi
echo "::notice::sop-tier-refire posted state=$STATE for context=\"$CONTEXT\" on sha=$HEAD_SHA"
exit "$TIER_EXIT"
+28
View File
@@ -0,0 +1,28 @@
#!/usr/bin/env bash
# Mock sop-tier-check.sh for sop-tier-refire tests.
#
# Exits 0 ("PASS") if $MOCK_TIER_RESULT == "pass", else exits 1.
# This lets the refire tests cover the success + failure status-POST
# paths without invoking the real sop-tier-check.sh (which uses bash 4+
# associative arrays — known parser bug on macOS bash 3.2 dev box).
set -euo pipefail
case "${MOCK_TIER_RESULT:-pass}" in
pass)
echo "::notice::mock tier-check: PASS"
exit 0
;;
fail_no_label)
echo "::error::mock tier-check: no tier label"
exit 1
;;
fail_no_approvals)
echo "::error::mock tier-check: no approving reviews"
exit 1
;;
*)
echo "::error::mock tier-check: unknown MOCK_TIER_RESULT=${MOCK_TIER_RESULT:-}"
exit 2
;;
esac
+208
View File
@@ -0,0 +1,208 @@
#!/usr/bin/env python3
"""Stub Gitea API for sop-tier-refire test scenarios.
Reads $FIXTURE_STATE_DIR/scenario to decide what to return for each
endpoint the sop-tier-refire.sh + sop-tier-check.sh scripts call.
Captures every POST to /statuses/{sha} into posted_statuses.jsonl so
the test can assert what the script tried to write.
Scenarios:
T1_success — tier:low + APPROVED by engineer → tier-check passes
T2_no_tier_label — no tier label → tier-check exits 1 before POST
T3_no_approvals — tier:low but zero approving reviews → exits 1
T4_closed — PR state=closed → refire is a no-op
T5_rate_limited — last status update 5 seconds ago → skip
Usage:
FIXTURE_STATE_DIR=/tmp/x python3 _refire_fixture.py 8080
"""
import datetime
import http.server
import json
import os
import re
import sys
import urllib.parse
STATE_DIR = os.environ["FIXTURE_STATE_DIR"]
def scenario() -> str:
p = os.path.join(STATE_DIR, "scenario")
if not os.path.isfile(p):
return "T1_success"
with open(p) as f:
return f.read().strip()
def now_iso() -> str:
return datetime.datetime.now(datetime.timezone.utc).isoformat()
def append_post(body: dict) -> None:
with open(os.path.join(STATE_DIR, "posted_statuses.jsonl"), "a") as f:
f.write(json.dumps(body) + "\n")
def pr_payload() -> dict:
sc = scenario()
state = "closed" if sc == "T4_closed" else "open"
return {
"number": 999,
"state": state,
"head": {"sha": "deadbeef0000111122223333444455556666"},
"user": {"login": "feature-author"},
}
def labels_payload() -> list:
sc = scenario()
if sc == "T2_no_tier_label":
return [{"name": "bug"}]
# All other scenarios use tier:low
return [{"name": "tier:low"}, {"name": "ci"}]
def reviews_payload() -> list:
sc = scenario()
if sc == "T3_no_approvals":
return []
# All other scenarios have one APPROVED review by an engineer
return [
{
"state": "APPROVED",
"user": {"login": "reviewer-engineer"},
}
]
def teams_payload() -> list:
# Mirror the real molecule-ai org teams referenced in TIER_EXPR
return [
{"id": 5, "name": "ceo"},
{"id": 2, "name": "engineers"},
{"id": 6, "name": "managers"},
]
def statuses_payload() -> list:
sc = scenario()
if sc == "T5_rate_limited":
recent = (
datetime.datetime.now(datetime.timezone.utc)
- datetime.timedelta(seconds=5)
).isoformat()
return [
{
"context": "sop-tier-check / tier-check (pull_request)",
"state": "failure",
"updated_at": recent,
}
]
return []
def user_payload() -> dict:
# Mirrors the WHOAMI probe in sop-tier-check.sh
return {"login": "sop-tier-bot-fixture"}
class Handler(http.server.BaseHTTPRequestHandler):
# Quiet — keep stdout for explicit logs only.
def log_message(self, *args, **kwargs): # noqa: D401
pass
def _json(self, code: int, body) -> None:
payload = json.dumps(body).encode()
self.send_response(code)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(payload)))
self.end_headers()
self.wfile.write(payload)
def _empty(self, code: int) -> None:
self.send_response(code)
self.send_header("Content-Length", "0")
self.end_headers()
def do_GET(self): # noqa: N802
u = urllib.parse.urlparse(self.path)
path = u.path
if path == "/_ping":
return self._json(200, {"ok": True})
if path == "/api/v1/user":
return self._json(200, user_payload())
# /api/v1/repos/{owner}/{name}/pulls/{n}
m = re.match(r"^/api/v1/repos/[^/]+/[^/]+/pulls/(\d+)$", path)
if m:
return self._json(200, pr_payload())
# /api/v1/repos/{owner}/{name}/issues/{n}/labels
if re.match(r"^/api/v1/repos/[^/]+/[^/]+/issues/\d+/labels$", path):
return self._json(200, labels_payload())
# /api/v1/repos/{owner}/{name}/pulls/{n}/reviews
if re.match(r"^/api/v1/repos/[^/]+/[^/]+/pulls/\d+/reviews$", path):
return self._json(200, reviews_payload())
# /api/v1/orgs/{owner}/teams
if re.match(r"^/api/v1/orgs/[^/]+/teams$", path):
return self._json(200, teams_payload())
# /api/v1/teams/{id}/members/{login} → 204 if user is an engineer
m = re.match(r"^/api/v1/teams/(\d+)/members/([^/]+)$", path)
if m:
team_id, login = m.group(1), m.group(2)
# In our fixture reviewer-engineer ∈ engineers (id=2)
if team_id == "2" and login == "reviewer-engineer":
return self._empty(204)
return self._empty(404)
# /api/v1/orgs/{owner}/members/{login} — fallback path used when
# team-member probes all 403. We don't need it for these tests.
if re.match(r"^/api/v1/orgs/[^/]+/members/[^/]+$", path):
return self._empty(404)
# /api/v1/repos/{owner}/{name}/statuses/{sha}
if re.match(r"^/api/v1/repos/[^/]+/[^/]+/statuses/[^/]+$", path):
return self._json(200, statuses_payload())
return self._json(404, {"path": path, "msg": "fixture: no route"})
def do_POST(self): # noqa: N802
u = urllib.parse.urlparse(self.path)
path = u.path
length = int(self.headers.get("Content-Length") or 0)
raw = self.rfile.read(length) if length else b""
try:
body = json.loads(raw) if raw else {}
except Exception:
body = {"_raw": raw.decode(errors="replace")}
if re.match(r"^/api/v1/repos/[^/]+/[^/]+/statuses/[^/]+$", path):
append_post(body)
# Echo back something status-shaped — script only checks HTTP code.
return self._json(
201,
{
"context": body.get("context"),
"state": body.get("state"),
"created_at": now_iso(),
},
)
return self._json(404, {"path": path, "msg": "fixture: no route"})
def main():
port = int(sys.argv[1])
srv = http.server.ThreadingHTTPServer(("127.0.0.1", port), Handler)
srv.serve_forever()
if __name__ == "__main__":
main()
+297
View File
@@ -0,0 +1,297 @@
#!/usr/bin/env bash
# Tests for sop-tier-refire.{yml,sh} — internal#292.
#
# Behavior matrix:
#
# T1: PR open + APPROVED via tier:low → script invokes sop-tier-check
# and POSTs status=success.
# T2: PR open + missing tier label → sop-tier-check exits non-zero;
# refire POSTs status=failure (description mentions failure).
# T3: PR open + tier:low but NO approving reviews → sop-tier-check
# exits non-zero; refire POSTs status=failure.
# T4: PR CLOSED → refire exits 0 with no status POST (no-op on closed).
# T5: Rate-limit — recent status update within 30s → refire skips,
# no new POST.
# T6 (yaml-lint): workflow `if:` expression contains author_association
# gate + slash-command-trigger gate + PR-not-issue gate.
# T7 (yaml-lint): workflow file is parseable YAML.
#
# Tests T1-T5 run the real script against a local-fixture HTTP server
# (python http.server with a stub handler — `tests/_refire_fixture.py`)
# so the script's Gitea API calls hit the fixture, not the real Gitea.
#
# Tests T6/T7 are pure YAML checks against the workflow file.
#
# Hostile-self-review (per feedback_assert_exact_not_substring):
# this test MUST FAIL if the workflow or script is absent. Verified by
# running the test before the files exist (covered in the PR body).
set -euo pipefail
THIS_DIR="$(cd "$(dirname "$0")" && pwd)"
SCRIPT_DIR="$(cd "$THIS_DIR/.." && pwd)"
WORKFLOW_DIR="$(cd "$THIS_DIR/../../workflows" && pwd)"
WORKFLOW="$WORKFLOW_DIR/sop-tier-refire.yml"
SCRIPT="$SCRIPT_DIR/sop-tier-refire.sh"
PASS=0
FAIL=0
FAILED_TESTS=""
assert_eq() {
local label="$1"
local expected="$2"
local got="$3"
if [ "$expected" = "$got" ]; then
echo " PASS $label"
PASS=$((PASS + 1))
else
echo " FAIL $label"
echo " expected: <$expected>"
echo " got: <$got>"
FAIL=$((FAIL + 1))
FAILED_TESTS="${FAILED_TESTS} ${label}"
fi
}
assert_contains() {
local label="$1"
local needle="$2"
local haystack="$3"
if printf '%s' "$haystack" | grep -qF "$needle"; then
echo " PASS $label"
PASS=$((PASS + 1))
else
echo " FAIL $label"
echo " needle: <$needle>"
echo " haystack: <$(printf '%s' "$haystack" | head -c 400)>"
FAIL=$((FAIL + 1))
FAILED_TESTS="${FAILED_TESTS} ${label}"
fi
}
assert_file_exists() {
local label="$1"
local path="$2"
if [ -f "$path" ]; then
echo " PASS $label"
PASS=$((PASS + 1))
else
echo " FAIL $label (not found: $path)"
FAIL=$((FAIL + 1))
FAILED_TESTS="${FAILED_TESTS} ${label}"
fi
}
# Existence (foundation — every other test depends on these)
echo
echo "== existence =="
assert_file_exists "workflow file exists" "$WORKFLOW"
assert_file_exists "script file exists" "$SCRIPT"
if [ "$FAIL" -gt 0 ]; then
echo
echo "------"
echo "PASS=$PASS FAIL=$FAIL (existence)"
echo "Cannot proceed without these files."
exit 1
fi
# T6 / T7 — workflow YAML structure
echo
echo "== T6/T7 workflow yaml =="
# YAML parseability
PARSE_OUT=$(python3 -c 'import sys,yaml;yaml.safe_load(open(sys.argv[1]).read());print("ok")' "$WORKFLOW" 2>&1 || true)
assert_eq "T7 workflow parses as YAML" "ok" "$PARSE_OUT"
# Three required gates in the `if:` expression
WORKFLOW_CONTENT=$(cat "$WORKFLOW")
assert_contains "T6a workflow if: contains author_association gate" \
"github.event.comment.author_association" "$WORKFLOW_CONTENT"
assert_contains "T6b workflow if: gates on MEMBER/OWNER/COLLABORATOR" \
'["MEMBER","OWNER","COLLABORATOR"]' "$WORKFLOW_CONTENT"
assert_contains "T6c workflow if: contains slash-command trigger" \
"/refire-tier-check" "$WORKFLOW_CONTENT"
assert_contains "T6d workflow if: gates on PR-not-issue" \
"github.event.issue.pull_request" "$WORKFLOW_CONTENT"
assert_contains "T6e workflow listens on issue_comment" \
"issue_comment" "$WORKFLOW_CONTENT"
assert_contains "T6f workflow requests statuses:write permission" \
"statuses: write" "$WORKFLOW_CONTENT"
# Does NOT check out PR HEAD (security)
if grep -q 'ref: \${{ github.event.pull_request.head' "$WORKFLOW"; then
echo " FAIL T6g workflow MUST NOT check out PR head (security)"
FAIL=$((FAIL + 1))
FAILED_TESTS="${FAILED_TESTS} T6g"
else
echo " PASS T6g workflow does not check out PR head"
PASS=$((PASS + 1))
fi
# T1-T5 — script behavior against a local Gitea-fixture
echo
echo "== T1-T5 script behavior (vs local fixture) =="
# Spin up the fixture HTTP server.
FIXTURE_DIR=$(mktemp -d)
trap 'rm -rf "$FIXTURE_DIR"; [ -n "${FIX_PID:-}" ] && kill "$FIX_PID" 2>/dev/null || true' EXIT
FIXTURE_PY="$THIS_DIR/_refire_fixture.py"
if [ ! -f "$FIXTURE_PY" ]; then
echo "::error::fixture server $FIXTURE_PY missing"
exit 1
fi
FIX_LOG="$FIXTURE_DIR/fixture.log"
FIX_STATE_DIR="$FIXTURE_DIR/state"
mkdir -p "$FIX_STATE_DIR"
# Find an unused port.
FIX_PORT=$(python3 -c 'import socket;s=socket.socket();s.bind(("127.0.0.1",0));print(s.getsockname()[1]);s.close()')
FIXTURE_STATE_DIR="$FIX_STATE_DIR" python3 "$FIXTURE_PY" "$FIX_PORT" \
>"$FIX_LOG" 2>&1 &
FIX_PID=$!
# Wait for fixture readiness.
for _ in $(seq 1 50); do
if curl -fsS "http://127.0.0.1:${FIX_PORT}/_ping" >/dev/null 2>&1; then
break
fi
sleep 0.1
done
if ! curl -fsS "http://127.0.0.1:${FIX_PORT}/_ping" >/dev/null 2>&1; then
echo "::error::fixture server failed to start. Log:"
cat "$FIX_LOG"
exit 1
fi
# Helper: set fixture state for a scenario, then run the script.
# tier_result is one of: pass | fail_no_label | fail_no_approvals.
# The refire script's tier-check invocation is mocked because the real
# sop-tier-check.sh uses bash 4+ associative arrays — incompatible with
# the macOS bash 3.2 dev shell. Linux Gitea runners use bash 4/5 so
# production runs the real script. The mock exercises the success +
# failure branches of refire's status-POST glue.
run_scenario() {
local scenario="$1"
local tier_result="${2:-pass}"
echo "$scenario" >"$FIX_STATE_DIR/scenario"
: >"$FIX_STATE_DIR/posted_statuses.jsonl" # clear status log
local out
set +e
out=$(
PATH="$FIXTURE_DIR/bin:$PATH" \
GITEA_TOKEN="fixture-token" \
GITEA_HOST="fixture.local" \
REPO="molecule-ai/molecule-core" \
PR_NUMBER="999" \
COMMENT_AUTHOR="test-runner" \
SOP_REFIRE_DISABLE_RATE_LIMIT="1" \
SOP_REFIRE_TIER_CHECK_SCRIPT="$THIS_DIR/_mock_tier_check.sh" \
MOCK_TIER_RESULT="$tier_result" \
FIXTURE_PORT="$FIX_PORT" \
bash "$SCRIPT" 2>&1
)
local rc=$?
set -e
echo "$out" >"$FIX_STATE_DIR/last_run.log"
echo "$rc" >"$FIX_STATE_DIR/last_rc"
}
# Install a curl shim that rewrites https://fixture.local → http://127.0.0.1:$PORT
# Use bash prefix-strip (${var#prefix}) — it sidesteps the `/` delimiter
# confusion of ${var/pattern/replacement}.
mkdir -p "$FIXTURE_DIR/bin"
cat >"$FIXTURE_DIR/bin/curl" <<SHIM
#!/usr/bin/env bash
# Test shim: rewrite https://fixture.local/* -> http://127.0.0.1:${FIX_PORT}/*
# The fixture doesn't authenticate; -H Authorization passes through harmlessly.
new_args=()
for a in "\$@"; do
if [[ "\$a" == https://fixture.local/* ]]; then
rest="\${a#https://fixture.local}"
a="http://127.0.0.1:${FIX_PORT}\${rest}"
fi
new_args+=("\$a")
done
exec /usr/bin/curl "\${new_args[@]}"
SHIM
chmod +x "$FIXTURE_DIR/bin/curl"
# T1: tier:low + 1 APPROVED + author is in engineers team → success
run_scenario "T1_success" "pass"
RC=$(cat "$FIX_STATE_DIR/last_rc")
POSTED=$(cat "$FIX_STATE_DIR/posted_statuses.jsonl" 2>/dev/null || true)
assert_eq "T1 exit code 0 (success)" "0" "$RC"
assert_contains "T1 POSTed state=success" '"state": "success"' "$POSTED"
assert_contains "T1 POST context is sop-tier-check / tier-check" \
'"context": "sop-tier-check / tier-check (pull_request)"' "$POSTED"
assert_contains "T1 description names commenter" "test-runner" "$POSTED"
# T2: missing tier label → tier-check fails → failure status POSTed
run_scenario "T2_no_tier_label" "fail_no_label"
RC=$(cat "$FIX_STATE_DIR/last_rc")
POSTED=$(cat "$FIX_STATE_DIR/posted_statuses.jsonl" 2>/dev/null || true)
# tier-check.sh exits 1; refire script forwards that exit, so RC != 0
if [ "$RC" -ne 0 ]; then
echo " PASS T2 exit code non-zero (got $RC)"
PASS=$((PASS + 1))
else
echo " FAIL T2 exit code should be non-zero, got 0"
FAIL=$((FAIL + 1))
FAILED_TESTS="${FAILED_TESTS} T2_rc"
fi
assert_contains "T2 POSTed state=failure" '"state": "failure"' "$POSTED"
# T3: tier:low present but ZERO approving reviews → failure
run_scenario "T3_no_approvals" "fail_no_approvals"
RC=$(cat "$FIX_STATE_DIR/last_rc")
POSTED=$(cat "$FIX_STATE_DIR/posted_statuses.jsonl" 2>/dev/null || true)
if [ "$RC" -ne 0 ]; then
echo " PASS T3 exit code non-zero (got $RC)"
PASS=$((PASS + 1))
else
echo " FAIL T3 exit code should be non-zero, got 0"
FAIL=$((FAIL + 1))
FAILED_TESTS="${FAILED_TESTS} T3_rc"
fi
assert_contains "T3 POSTed state=failure" '"state": "failure"' "$POSTED"
# T4: closed PR — refire is a no-op (no POST, exit 0)
run_scenario "T4_closed" "pass"
RC=$(cat "$FIX_STATE_DIR/last_rc")
POSTED=$(cat "$FIX_STATE_DIR/posted_statuses.jsonl" 2>/dev/null || true)
assert_eq "T4 closed PR exits 0" "0" "$RC"
assert_eq "T4 closed PR posts no status" "" "$POSTED"
# T5: rate-limit — disable the env override and let scenario set a
# recent statuses entry. Re-enable rate-limit for this scenario by NOT
# passing SOP_REFIRE_DISABLE_RATE_LIMIT.
echo "T5_rate_limited" >"$FIX_STATE_DIR/scenario"
: >"$FIX_STATE_DIR/posted_statuses.jsonl"
set +e
T5_OUT=$(
PATH="$FIXTURE_DIR/bin:$PATH" \
GITEA_TOKEN="fixture-token" \
GITEA_HOST="fixture.local" \
REPO="molecule-ai/molecule-core" \
PR_NUMBER="999" \
COMMENT_AUTHOR="test-runner" \
FIXTURE_PORT="$FIX_PORT" \
bash "$SCRIPT" 2>&1
)
T5_RC=$?
set -e
POSTED=$(cat "$FIX_STATE_DIR/posted_statuses.jsonl" 2>/dev/null || true)
assert_eq "T5 rate-limited exits 0" "0" "$T5_RC"
assert_contains "T5 rate-limited log says skipped" "rate-limited" "$T5_OUT"
assert_eq "T5 rate-limited posts no status" "" "$POSTED"
echo
echo "------"
echo "PASS=$PASS FAIL=$FAIL"
if [ "$FAIL" -gt 0 ]; then
echo "Failed:$FAILED_TESTS"
fi
[ "$FAIL" -eq 0 ]
+1 -1
View File
@@ -56,7 +56,7 @@ on:
# 2. Avoid colliding with the existing :15 sweep-cf-orphans
# and :45 sweep-cf-tunnels — both hit the CF API and we
# don't want to fight for rate-limit tokens.
# 3. Avoid the :30 heavy slot (canary-staging /30, sweep-aws-
# 3. Avoid the :30 heavy slot (staging-smoke /30, sweep-aws-
# secrets, sweep-stale-e2e-orgs every :15) — multiple
# overlapping cron registrations on the same minute is part
# of what GH drops under load.
+6 -3
View File
@@ -124,7 +124,10 @@ jobs:
env:
CANVAS_E2E_STAGING: '1'
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
defaults:
run:
@@ -145,7 +148,7 @@ jobs:
if: needs.detect-changes.outputs.canvas == 'true'
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::Missing MOLECULE_STAGING_ADMIN_TOKEN"
echo "::error::Missing CP_STAGING_ADMIN_API_TOKEN"
exit 2
fi
@@ -207,7 +210,7 @@ jobs:
- name: Teardown safety net
if: always() && needs.detect-changes.outputs.canvas == 'true'
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
STATE_FILE=".playwright-staging-state.json"
+6 -3
View File
@@ -89,7 +89,10 @@ jobs:
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
E2E_STALE_WAIT_SECS: ${{ github.event.inputs.stale_wait_secs || '180' }}
@@ -104,7 +107,7 @@ jobs:
# missing — silent skip would mask infra rot. Manual dispatch
# gets the same hard-fail; an operator running this on a fork
# without secrets configured needs to know up-front.
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
echo "Admin token present ✓"
@@ -129,7 +132,7 @@ jobs:
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
+7 -4
View File
@@ -86,7 +86,10 @@ jobs:
# Single admin-bearer secret drives provision + tenant-token
# retrieval + teardown. Configure in
# Settings → Secrets and variables → Actions → Repository secrets.
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
# MiniMax is the PRIMARY LLM auth path post-2026-05-04. Switched
# from hermes+OpenAI default after #2578 (the staging OpenAI key
# account went over quota and stayed dead for 36+ hours, taking
@@ -95,7 +98,7 @@ jobs:
# ANTHROPIC_BASE_URL to api.minimax.io/anthropic and reads
# MINIMAX_API_KEY at boot — separate billing account so an
# OpenAI quota collapse no longer wedges the gate. Mirrors the
# canary-staging.yml + continuous-synth-e2e.yml migrations.
# staging-smoke.yml + continuous-synth-e2e.yml migrations.
E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
# Direct-Anthropic alternative for operators who don't want to
# set up a MiniMax account (priority below MiniMax — first
@@ -122,7 +125,7 @@ jobs:
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
echo "Admin token present ✓"
@@ -189,7 +192,7 @@ jobs:
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
# Best-effort: find any e2e-YYYYMMDD-* orgs matching this run and
# nuke them. Catches the case where the script died before
+19 -10
View File
@@ -11,11 +11,11 @@ name: E2E Staging Sanity (leak-detection self-check)
# - `continue-on-error: true` on the job (RFC §1 contract).
#
# Periodic assertion that the teardown safety nets in e2e-staging-saas
# and canary-staging actually work. Runs the E2E harness with
# E2E_INTENTIONAL_FAILURE=1, which poisons the tenant admin token after
# the org is provisioned. The workspace-provision step then fails, the
# script exits non-zero, and the EXIT trap + workflow always()-step
# must still tear down cleanly.
# and staging-smoke (formerly canary-staging) actually work. Runs the
# E2E harness with E2E_INTENTIONAL_FAILURE=1, which poisons the tenant
# admin token after the org is provisioned. The workspace-provision
# step then fails, the script exits non-zero, and the EXIT trap +
# workflow always()-step must still tear down cleanly.
on:
schedule:
@@ -42,8 +42,11 @@ jobs:
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
E2E_MODE: canary
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
E2E_MODE: smoke
E2E_RUNTIME: hermes
E2E_RUN_ID: "sanity-${{ github.run_id }}"
E2E_INTENTIONAL_FAILURE: "1"
@@ -54,7 +57,7 @@ jobs:
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN not set"
echo "::error::CP_STAGING_ADMIN_API_TOKEN not set"
exit 2
fi
@@ -118,7 +121,7 @@ jobs:
- name: Teardown safety net
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
@@ -127,8 +130,14 @@ jobs:
import json, sys
d = json.load(sys.stdin)
today = __import__('datetime').date.today().strftime('%Y%m%d')
# Match both the new e2e-smoke- prefix (post-2026-05-11 rename)
# and the legacy e2e-canary- prefix for one rollout cycle so
# any in-flight org provisioned under the old prefix on an
# older runner checkout still gets cleaned up. Remove the
# canary fallback after one week of no-old-prefix observations.
prefixes = (f'e2e-smoke-{today}-sanity-', f'e2e-canary-{today}-sanity-')
candidates = [o['slug'] for o in d.get('orgs', [])
if o.get('slug','').startswith(f'e2e-canary-{today}-sanity-')
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('status') not in ('purged',)]
print('\n'.join(candidates))
" 2>/dev/null)
+35 -1
View File
@@ -68,7 +68,35 @@ jobs:
run: ${{ steps.decide.outputs.run }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Fetch base branch tip for diff
continue-on-error: true
run: |
# With the default fetch-depth: 1, actions/checkout only fetches the
# PR head commit. The base commit is NOT in the local history, so
# `git diff "$BASE" "$GITHUB_SHA"` fails. Fetch the base branch at
# depth 1 — the base commit is the immediate parent of the PR head
# on the base branch, so depth=1 is sufficient.
#
# Network: Gitea Actions runner (5.78.80.188) cannot reach the git
# remote over HTTPS (confirmed: git fetch times out at ~15s). The runner
# is on the same host as Gitea, but the container network namespace
# cannot reach the Gitea HTTPS endpoint.
#
# Fallback: if the base commit does not exist locally, skip the diff
# and set run=true (always run harness). This is safe: PRs where the
# base is unavailable still run the harness (correct), PRs where the
# base IS available get the correct path-based diff.
#
# Timeout: 20s. If the fetch completes, great. If it times out, the
# step exits non-zero and we fall through to run=true.
if timeout 20 git fetch origin "${{ github.event.pull_request.base.ref }}" --depth=1; then
echo "::notice::base branch fetched successfully"
else
echo "::warning::git fetch origin ${{ github.event.pull_request.base.ref }} --depth=1 timed out"
echo "::warning::Skipping diff — detect-changes will run the harness unconditionally."
fi
- id: decide
continue-on-error: true
run: |
# workflow_dispatch: always run (manual trigger)
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
@@ -95,7 +123,13 @@ jobs:
fi
# GitHub Actions and Gitea Actions both expose github.sha for HEAD.
DIFF=$(git diff --name-only "$BASE" "${{ github.sha }}" 2>/dev/null)
# git diff exits 1 when BASE is not in local history (e.g. shallow
# checkout where the base commit was never fetched). Capture and
# swallow that exit code — the empty diff means "run everything".
# The runner network cannot reach the git remote (confirmed: git fetch
# times out at ~15s), so a failed fetch is expected and we always fall
# through to the unconditional run=true below.
DIFF=$(git diff --name-only "$BASE" "${{ github.sha }}" 2>/dev/null) || true
echo "debug=diff-base=$BASE diff-files=$DIFF" >> "$GITHUB_OUTPUT"
if echo "$DIFF" | grep -qE '^workspace-server/|^canvas/|^tests/harness/|^.gitea/workflows/harness-replays\.yml$'; then
+1 -1
View File
@@ -11,7 +11,7 @@ name: publish-canvas-image
# - `continue-on-error: true` on each job (RFC §1 contract).
# - **Open question for review**: this workflow pushes the canvas
# image to `ghcr.io`. GHCR was retired during the 2026-05-06
# Gitea migration in favor of ECR (per canary-verify.yml header
# Gitea migration in favor of ECR (per staging-verify.yml header
# notes). The image may not be consumable post-migration. Two
# options for follow-up: (a) retarget to
# `153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/canvas`,
+12 -4
View File
@@ -43,10 +43,18 @@ jobs:
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
# Fetch full tag list so the bump logic can sanity-check against
# what's already in this repo (catches collision with prior
# manual tag pushes).
fetch-depth: 0
# Shallow clone — depth 1 is enough for the workspace-diff check.
# Tags needed for the collision check below are fetched explicitly
# in the next step, bypassing the runner-network timeout that
# full-history fetch triggers on Gitea Actions runners
# (runbooks/gitea-operational-quirks.md §runner-network-isolation).
fetch-depth: 1
- name: Fetch tags for collision check
# fetch-depth: 1 gets only the most recent commit's refs, not the
# tag that points at it. Do a targeted tag fetch so git tag --list
# below can detect collision with prior manual pushes.
run: git fetch origin --tags --depth=1
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
@@ -32,7 +32,7 @@ name: redeploy-tenants-on-main
#
# Registry: ECR (153263036946.dkr.ecr.us-east-2.amazonaws.com/
# molecule-ai/platform-tenant). GHCR was retired 2026-05-07 during the
# Gitea suspension migration. The canary-verify.yml promote step now
# Gitea suspension migration. The staging-verify.yml promote step now
# uses the same redeploy-fleet endpoint (fixes the silent-GHCR gap).
#
# Runtime ordering:
@@ -104,7 +104,7 @@ jobs:
# `staging-<sha>` to roll back to a known-good build.
# 2. Default → `staging-<short_head_sha>`. The just-published
# digest. Bypasses the `:latest` retag path that's currently
# dead (canary-verify soft-skips without canary fleet, so
# dead (staging-verify soft-skips without canary fleet, so
# the only thing retagging `:latest` today is the manual
# promote-latest.yml — last run 2026-04-28). Auto-trigger
# from workflow_run uses workflow_run.head_sha; manual
@@ -359,7 +359,7 @@ jobs:
# Belt-and-suspenders sanity floor: same logic as the staging
# variant — see that file's comment for the full rationale.
# Floor only applies when fleet >= 4; below that, canary-verify
# Floor only applies when fleet >= 4; below that, staging-verify
# is the actual gate.
TOTAL_VERIFIED=${#SLUGS[@]}
if [ $TOTAL_VERIFIED -ge 4 ] && [ $UNREACHABLE_COUNT -gt $((TOTAL_VERIFIED / 2)) ]; then
@@ -21,7 +21,7 @@ name: redeploy-tenants-on-staging
#
# Mirror of redeploy-tenants-on-main.yml, with the staging-CP host and
# the :staging-latest tag. Sister workflow exists for prod (rolls
# :latest after canary-verify). Both share the same shape — just
# :latest after staging-verify). Both share the same shape — just
# different CP_URL + target_tag + admin token secret.
#
# Why this workflow exists: publish-workspace-server-image now builds
@@ -336,7 +336,7 @@ jobs:
# crashes on startup), not a teardown race. Hard-fail.
#
# Floor only applies when TOTAL_VERIFIED >= 4 — below that, the
# canary-verify step is the actual gate for "all tenants down"
# staging-verify step is the actual gate for "all tenants down"
# detection (it runs against the canary first and aborts the
# rollout if the canary fails to come up). Without the >=4 gate,
# a 1-tenant fleet (e.g. a single ephemeral e2e-* tenant on a
+79
View File
@@ -0,0 +1,79 @@
# sop-tier-refire — issue_comment-triggered refire of sop-tier-check.
#
# Closes internal#292. Gitea 1.22.6 doesn't refire workflows on the
# `pull_request_review` event (go-gitea/gitea#33700); the `sop-tier-check`
# workflow's review-event subscription is silently dead. The result:
# PRs that get their approving review AFTER the tier-check ran on open/
# synchronize keep their failing status check forever, and the only way
# to merge is the admin force-merge path (audited via `audit-force-merge`
# but the audit trail keeps growing; see `feedback_never_admin_merge_bypass`).
#
# Workaround pattern from `feedback_pull_request_review_no_refire`:
# `issue_comment` events DO fire reliably on 1.22.6. When a repo
# MEMBER/OWNER/COLLABORATOR comments `/refire-tier-check` on a PR, this
# workflow re-runs the sop-tier-check logic and POSTs the resulting
# status to the PR head SHA directly. No empty commit, no git history
# bloat, no cascade re-fire of every other workflow on the PR.
#
# SECURITY MODEL:
#
# 1. `pull_request` exists on the issue (issue_comment fires on issues
# AND PRs; we only want PRs).
# 2. `comment.author_association` must be MEMBER/OWNER/COLLABORATOR.
# Per the internal#292 core-security review (review#1066 ask): anyone
# can comment, but only repo collaborators+ can flip the status.
# Without this gate, a drive-by commenter on a public-issue-tracker
# surface could trigger a status flip.
# 3. Comment body must contain `/refire-tier-check` — a slash-command-
# shaped trigger (not just any comment word). Prevents accidental
# triggering from prose like "we should refire tests" in a review.
# 4. This workflow does NOT check out PR HEAD code. Like sop-tier-check,
# it only HTTP-calls the Gitea API. Trust boundary preserved.
#
# Note: `issue_comment` fires from the BASE branch's workflow file. There
# is no `pull_request_target` equivalent to set; the trigger inherently
# loads the workflow from the default branch.
#
# Rate-limit: a 1s pre-sleep + a "skip if status posted in last 30s"
# guard prevents comment-spam from thrashing the status. See the script.
name: sop-tier-check refire (issue_comment)
on:
issue_comment:
types: [created]
jobs:
refire:
# Three gates, all required:
# - comment is on a PR (not a plain issue)
# - commenter is MEMBER, OWNER, or COLLABORATOR
# - comment body contains the slash-command trigger
if: |
github.event.issue.pull_request != null &&
contains(fromJson('["MEMBER","OWNER","COLLABORATOR"]'), github.event.comment.author_association) &&
contains(github.event.comment.body, '/refire-tier-check')
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
statuses: write
steps:
- name: Check out base branch (for the script)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
# Load the script from the default branch (main), matching the
# sop-tier-check.yml security model.
ref: ${{ github.event.repository.default_branch }}
- name: Re-evaluate sop-tier-check and POST status
env:
# Same org-level secret sop-tier-check.yml + audit-force-merge.yml use.
# Fallback to GITHUB_TOKEN with a clear error if missing.
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
GITEA_HOST: git.moleculesai.app
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
COMMENT_AUTHOR: ${{ github.event.comment.user.login }}
# Set to '1' for diagnostic per-API-call output. Off by default.
SOP_DEBUG: '0'
run: bash .gitea/scripts/sop-tier-refire.sh
@@ -1,6 +1,8 @@
name: Canary — staging SaaS smoke (every 30 min)
name: Staging SaaS smoke (every 30 min)
# Ported from .github/workflows/canary-staging.yml on 2026-05-11 per RFC
# Renamed from canary-staging.yml on 2026-05-11 per Hongming directive
# ("canary naming changed to staging for all"). Originally ported from
# .github/workflows/canary-staging.yml on 2026-05-11 per RFC
# internal#219 §1 sweep. Differences from the GitHub version:
# - Dropped `workflow_dispatch.inputs` (Gitea 1.22.6 parser rejects them
# per feedback_gitea_workflow_dispatch_inputs_unsupported).
@@ -21,21 +23,21 @@ name: Canary — staging SaaS smoke (every 30 min)
# catches drift in the 30-min window between those runs (AMI health, CF
# cert rotation, WorkOS session stability, etc.).
#
# Lean mode: E2E_MODE=canary skips the child workspace + HMA memory +
# Lean mode: E2E_MODE=smoke skips the child workspace + HMA memory +
# peers/activity checks. One parent workspace + one A2A turn is enough
# to signal "SaaS stack end-to-end is alive."
on:
schedule:
# Every 30 min. Cron on GitHub-hosted runners has a known drift of
# a few minutes under load — that's fine for a canary.
# a few minutes under load — that's fine for a smoke check.
- cron: '*/30 * * * *'
# Serialise with the full-SaaS workflow so they don't contend for the
# same org-create quota on staging. Different group key from
# e2e-staging-saas since we don't mind queueing canaries behind one
# full run, but two canaries SHOULD queue against each other.
# e2e-staging-saas since we don't mind queueing smoke runs behind one
# full run, but two smoke runs SHOULD queue against each other.
concurrency:
group: canary-staging
group: staging-smoke
cancel-in-progress: false
permissions:
@@ -47,32 +49,47 @@ env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
canary:
name: Canary smoke
smoke:
name: Staging SaaS smoke
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
continue-on-error: true
# NOTE: Phase 3 (RFC #219 §1) `continue-on-error: true` removed
# 2026-05-11. The "surface broken workflows without blocking"
# rationale was correctly applied to advisory/lint workflows but
# wrong for this smoke — it is the 30-min canary cadence for the
# entire staging SaaS stack, and silent failure here masks the
# exact regressions the smoke exists to surface (AMI rot, CF cert
# drift, WorkOS session breakage, secret rotations). Same class of
# failure as PR#461 (`sweep-stale-e2e-orgs`) where Phase-3 silent
# failure leaked EC2. The four other `e2e-staging-*` workflows
# KEEP `continue-on-error: true` per RFC #219 §1 — they are
# advisory and matrix-style; this one is the canary. A follow-up
# `notify-failure` step below also surfaces breakage to ops even
# if branch-protection wiring is adjusted to keep this off the
# required-checks list.
# 25 min headroom over the 15-min TLS-readiness deadline in
# tests/e2e/test_staging_full_saas.sh (#2107). Without the buffer
# the job is killed at the wall-clock 15:00 mark BEFORE the bash
# `fail` + diagnostic burst can fire, leaving every cancellation
# silent. Sibling staging E2E jobs run at 20-45 min — keeping
# canary tighter than them so a true wedge still surfaces here
# silent. Sibling staging E2E jobs run at 20-45 min — keeping the
# smoke tighter than them so a true wedge still surfaces here
# first.
timeout-minutes: 25
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# MiniMax is the canary's PRIMARY LLM auth path post-2026-05-04.
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
# MiniMax is the smoke's PRIMARY LLM auth path post-2026-05-04.
# Switched from hermes+OpenAI after #2578 (the staging OpenAI key
# account went over quota and stayed dead for 36+ hours, taking
# the canary red the entire time). claude-code template's
# the smoke red the entire time). claude-code template's
# `minimax` provider routes ANTHROPIC_BASE_URL to
# api.minimax.io/anthropic and reads MINIMAX_API_KEY at boot —
# ~5-10x cheaper per token than gpt-4.1-mini AND on a separate
# billing account, so OpenAI quota collapse no longer wedges the
# canary. Mirrors the migration continuous-synth-e2e.yml made on
# smoke. Mirrors the migration continuous-synth-e2e.yml made on
# 2026-05-03 (#265) for the same reason. tests/e2e/test_staging_
# full_saas.sh branches SECRETS_JSON on which key is present —
# MiniMax wins when set.
@@ -86,16 +103,16 @@ jobs:
# E2E_RUNTIME=hermes overridden via workflow_dispatch can still
# exercise the OpenAI path without re-editing the workflow.
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
E2E_MODE: canary
E2E_MODE: smoke
E2E_RUNTIME: claude-code
# Pin the canary to a specific MiniMax model rather than relying
# Pin the smoke to a specific MiniMax model rather than relying
# on the per-runtime default (which could resolve to "sonnet" →
# direct Anthropic and defeat the cost saving). M2.7-highspeed
# is "Token Plan only" but cheap-per-token and fast.
E2E_MODEL_SLUG: MiniMax-M2.7-highspeed
E2E_RUN_ID: "canary-${{ github.run_id }}"
E2E_RUN_ID: "smoke-${{ github.run_id }}"
# Debug-only: when an operator dispatches with keep_on_failure=true,
# the canary script's E2E_KEEP_ORG=1 path skips teardown so the
# the smoke script's E2E_KEEP_ORG=1 path skips teardown so the
# tenant org + EC2 stay alive for SSM-based log capture. Cron runs
# never set this (the input only exists on workflow_dispatch) so
# unattended cron always tears down. See molecule-core#129
@@ -109,7 +126,7 @@ jobs:
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN not set"
echo "::error::CP_STAGING_ADMIN_API_TOKEN not set"
exit 2
fi
@@ -119,7 +136,7 @@ jobs:
# langgraph (operator-dispatched only) use OpenAI. Hard-fail
# rather than soft-skip per the lesson from synth E2E #2578:
# an empty key silently falls through to the wrong
# SECRETS_JSON branch and the canary fails 5 min later with
# SECRETS_JSON branch and the smoke fails 5 min later with
# a confusing auth error instead of the clean "secret
# missing" message at the top.
case "${E2E_RUNTIME}" in
@@ -155,8 +172,8 @@ jobs:
fi
echo "LLM key present ✓ (runtime=${E2E_RUNTIME}, key=${required_secret_name}, len=${#required_secret_value})"
- name: Canary run
id: canary
- name: Smoke run
id: smoke
run: bash tests/e2e/test_staging_full_saas.sh
# Alerting: open a sticky issue on the FIRST failure; comment on
@@ -184,6 +201,9 @@ jobs:
run: |
set -euo pipefail
API="${SERVER_URL%/}/api/v1"
# Title kept stable across the canary-staging.yml → staging-smoke.yml
# rename (2026-05-11) so any open alert issue from the old name
# still title-matches and auto-closes on the next green run.
TITLE="Canary failing: staging SaaS smoke"
RUN_URL="${SERVER_URL}/${REPO}/actions/runs/${RUN_ID}"
@@ -194,18 +214,18 @@ jobs:
if [ -n "$EXISTING" ]; then
curl -fsS -X POST -H "Authorization: token $GITEA_TOKEN" -H "Content-Type: application/json" \
"${API}/repos/${REPO}/issues/${EXISTING}/comments" \
-d "$(jq -nc --arg run "$RUN_URL" '{body: ("Canary still failing. " + $run)}')" >/dev/null
-d "$(jq -nc --arg run "$RUN_URL" '{body: ("Smoke still failing. " + $run)}')" >/dev/null
echo "Commented on existing issue #${EXISTING}"
else
NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)
BODY=$(jq -nc --arg t "$TITLE" --arg now "$NOW" --arg run "$RUN_URL" \
'{title: $t, body: ("Canary run failed at " + $now + ".\n\nRun: " + $run + "\n\nThis issue auto-closes on the next green canary run. Consecutive failures add a comment here rather than a new issue.")}')
'{title: $t, body: ("Smoke run failed at " + $now + ".\n\nRun: " + $run + "\n\nThis issue auto-closes on the next green smoke run. Consecutive failures add a comment here rather than a new issue.")}')
curl -fsS -X POST -H "Authorization: token $GITEA_TOKEN" -H "Content-Type: application/json" \
"${API}/repos/${REPO}/issues" -d "$BODY" >/dev/null
echo "Opened canary failure issue (first red)"
echo "Opened smoke failure issue (first red)"
fi
- name: Auto-close canary issue on success (Gitea API)
- name: Auto-close smoke issue on success (Gitea API)
if: success()
env:
GITEA_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -215,6 +235,8 @@ jobs:
run: |
set -euo pipefail
API="${SERVER_URL%/}/api/v1"
# Title kept stable across the canary-staging.yml → staging-smoke.yml
# rename so open alert issues from the old name still match.
TITLE="Canary failing: staging SaaS smoke"
NUMS=$(curl -fsS -H "Authorization: token $GITEA_TOKEN" \
@@ -225,37 +247,36 @@ jobs:
for N in $NUMS; do
curl -fsS -X POST -H "Authorization: token $GITEA_TOKEN" -H "Content-Type: application/json" \
"${API}/repos/${REPO}/issues/${N}/comments" \
-d "$(jq -nc --arg now "$NOW" '{body: ("Canary recovered at " + $now + ". Closing.")}')" >/dev/null
-d "$(jq -nc --arg now "$NOW" '{body: ("Smoke recovered at " + $now + ". Closing.")}')" >/dev/null
curl -fsS -X PATCH -H "Authorization: token $GITEA_TOKEN" -H "Content-Type: application/json" \
"${API}/repos/${REPO}/issues/${N}" -d '{"state":"closed"}' >/dev/null
echo "Closed recovered canary issue #${N}"
echo "Closed recovered smoke issue #${N}"
done
- name: Teardown safety net
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
# Slug prefix matches what test_staging_full_saas.sh emits
# in canary mode:
# SLUG="e2e-canary-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
# Earlier this was `e2e-{today}-canary-` — that was the
# full-mode pattern (date FIRST, mode SECOND); canary slugs
# have mode FIRST, date SECOND. The mismatch silently
# never matched, leaving every cancelled-canary EC2 alive
# until the once-an-hour sweep eventually caught it
# (incident 2026-04-26 21:03Z: 1h25m EC2 leak before manual
# cleanup; same gap on three earlier cancellations today).
# in smoke mode:
# SLUG="e2e-smoke-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
# Earlier (pre-2026-05-11 canary→staging rename) the prefix was
# `e2e-canary-`; both prefixes are matched here for one
# release cycle so cleanup still catches any in-flight org
# provisioned under the old prefix on an older runner that
# hasn't picked up the renamed script. Remove the canary
# fallback after one week of no-old-prefix observations.
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
d = json.load(sys.stdin)
# Scope to slugs from THIS canary run when GITHUB_RUN_ID is
# available; the canary workflow sets E2E_RUN_ID='canary-\${run_id}'
# so the slug suffix is '-canary-\${run_id}-...'. Mirrors the
# Scope to slugs from THIS smoke run when GITHUB_RUN_ID is
# available; the smoke workflow sets E2E_RUN_ID='smoke-\${run_id}'
# so the slug suffix is '-smoke-\${run_id}-...'. Mirrors the
# full-mode safety net's per-run scoping (e2e-staging-saas.yml)
# added after the 2026-04-21 cross-run cleanup incident.
# Sweep both today AND yesterday's UTC dates so a run that
@@ -265,9 +286,11 @@ jobs:
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
if run_id:
prefixes = tuple(f'e2e-canary-{d}-canary-{run_id}' for d in dates)
prefixes = tuple(f'e2e-smoke-{d}-smoke-{run_id}' for d in dates) \
+ tuple(f'e2e-canary-{d}-canary-{run_id}' for d in dates)
else:
prefixes = tuple(f'e2e-canary-{d}-' for d in dates)
prefixes = tuple(f'e2e-smoke-{d}-' for d in dates) \
+ tuple(f'e2e-canary-{d}-' for d in dates)
candidates = [o['slug'] for o in d.get('orgs', [])
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('status') not in ('purged',)]
@@ -280,8 +303,8 @@ jobs:
# stale sweep caught it (up to 2h later). Now we capture the
# response code and surface non-2xx as a workflow warning, so
# the run page shows which slug leaked. We still don't `exit 1`
# on cleanup failure — a single-canary cleanup miss shouldn't
# fail-flag the canary itself when the actual smoke check
# on cleanup failure — a single-smoke cleanup miss shouldn't
# fail-flag the smoke itself when the actual smoke check
# passed. The sweep-stale-e2e-orgs cron (now every 15 min,
# 30-min threshold) is the safety net for whatever slips past.
# See molecule-controlplane#420.
@@ -290,21 +313,34 @@ jobs:
# Tempfile-routed -w + set +e/-e prevents curl-exit-code
# pollution of the captured status (lint-curl-status-capture.yml).
set +e
curl -sS -o /tmp/canary-cleanup.out -w "%{http_code}" \
curl -sS -o /tmp/smoke-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/canary-cleanup.code
-d "{\"confirm\":\"$slug\"}" >/tmp/smoke-cleanup.code
set -e
code=$(cat /tmp/canary-cleanup.code 2>/dev/null || echo "000")
code=$(cat /tmp/smoke-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::canary teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/canary-cleanup.out 2>/dev/null)"
echo "::warning::smoke teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/smoke-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::canary teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
echo "::warning::smoke teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
- name: Notify on smoke failure
# Fail-loud companion to dropping `continue-on-error: true`.
# The Open-issue-on-failure step above handles the human-facing
# alert; this step emits a clearly-tagged ::error:: line that
# log-tail consumers (Loki SOPRefireRule, orchestrator triage
# loop) can grep on. Mirrors PR#461's sweep-stale-e2e-orgs
# pattern. Runs AFTER the teardown safety net (which is
# if: always()) so failures don't suppress cleanup.
if: failure()
run: |
echo "::error::staging-smoke FAILED — staging SaaS canary is red. See prior step logs + the auto-filed alert issue. Common causes: (a) CP_STAGING_ADMIN_API_TOKEN secret missing/rotated, (b) staging-api.moleculesai.app 5xx, (c) MiniMax/Anthropic LLM key dead, (d) AMI/CF/WorkOS drift. The 30-min cron will retry, but a chronic red here indicates the staging SaaS stack is broken end-to-end."
exit 1
@@ -1,6 +1,8 @@
name: canary-verify
name: Staging verify
# Ported from .github/workflows/canary-verify.yml on 2026-05-11 per RFC
# Renamed from canary-verify.yml on 2026-05-11 per Hongming directive
# ("canary naming changed to staging for all"). Originally ported from
# .github/workflows/canary-verify.yml on 2026-05-11 per RFC
# internal#219 §1 sweep. Differences from the GitHub version:
# - Dropped `workflow_dispatch.inputs` (Gitea 1.22.6 parser rejects them
# per feedback_gitea_workflow_dispatch_inputs_unsupported).
@@ -23,13 +25,22 @@ name: canary-verify
# digest. On red, :latest stays on the prior known-good digest and
# prod is untouched.
#
# Terminology note (2026-05-11): The deployment STRATEGY here is still
# called "canary release" (a small subset of tenants gets the new image
# first, the rest follow on green). The "canary" word stays for the
# pre-fan-out cohort concept (see docs/architecture/canary-release.md
# and CANARY_SLUG in redeploy-tenants-on-*.yml). What changed is the
# FILE NAME and the SECRETS feeding this workflow — both are renamed
# to drop the redundant "canary-" prefix that conflated workflow
# identity with deployment strategy.
#
# Registry note (2026-05-10): This workflow previously used GHCR
# (ghcr.io/molecule-ai/platform-tenant) — that registry was retired
# during the 2026-05-06 Gitea suspension migration when publish-
# workspace-server-image.yml switched to the operator's ECR org
# (153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/
# platform-tenant). The GHCR → ECR migration was never applied to
# this file, so canary-verify was silently smoke-testing the stale
# this file, so this workflow was silently smoke-testing the stale
# GHCR image while the actual staging/prod tenants ran the ECR image.
# Result: smoke tests could not catch a broken ECR build. Fix:
# - Wait step: reads SHA from running canary /health (tenant-
@@ -43,8 +54,9 @@ name: canary-verify
# to ECR on staging and main merges.
# - Canary tenants are configured to pull :staging-<sha> from ECR
# (TENANT_IMAGE env set to the ECR :staging-<sha> tag).
# - Repo secrets CANARY_TENANT_URLS / CANARY_ADMIN_TOKENS /
# CANARY_CP_SHARED_SECRET are populated.
# - Repo secrets MOLECULE_STAGING_TENANT_URLS /
# MOLECULE_STAGING_ADMIN_TOKENS / MOLECULE_STAGING_CP_SHARED_SECRET
# are populated.
on:
workflow_run:
@@ -65,7 +77,7 @@ env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
canary-smoke:
staging-smoke:
# Skip when the upstream workflow failed — no image to test against.
# workflow_dispatch trigger dropped in this Gitea port; only the
# workflow_run path remains.
@@ -97,15 +109,15 @@ jobs:
# other registry — the canary is telling us what it's actually
# running, which is the ground truth for smoke testing.
env:
CANARY_TENANT_URLS: ${{ secrets.CANARY_TENANT_URLS }}
MOLECULE_STAGING_TENANT_URLS: ${{ secrets.MOLECULE_STAGING_TENANT_URLS }}
EXPECTED_SHA: ${{ steps.compute.outputs.sha }}
run: |
if [ -z "$CANARY_TENANT_URLS" ]; then
if [ -z "$MOLECULE_STAGING_TENANT_URLS" ]; then
echo "No canary URLs configured — falling back to 60s wait"
sleep 60
exit 0
fi
IFS=',' read -ra URLS <<< "$CANARY_TENANT_URLS"
IFS=',' read -ra URLS <<< "$MOLECULE_STAGING_TENANT_URLS"
MAX_WAIT=420 # 7 minutes
INTERVAL=30
ELAPSED=0
@@ -129,7 +141,7 @@ jobs:
done
echo "Timeout after ${MAX_WAIT}s — proceeding anyway (smoke suite will validate)"
- name: Run canary smoke suite
- name: Run staging smoke suite
id: smoke
# Graceful-skip when no canary fleet is configured (Phase 2 not yet
# stood up — see molecule-controlplane/docs/canary-tenants.md).
@@ -138,29 +150,29 @@ jobs:
# promote-latest.yml is the release gate while canary is absent.
# Once the fleet is real: delete the early-exit branch.
env:
CANARY_TENANT_URLS: ${{ secrets.CANARY_TENANT_URLS }}
CANARY_ADMIN_TOKENS: ${{ secrets.CANARY_ADMIN_TOKENS }}
CANARY_CP_BASE_URL: https://staging-api.moleculesai.app
CANARY_CP_SHARED_SECRET: ${{ secrets.CANARY_CP_SHARED_SECRET }}
MOLECULE_STAGING_TENANT_URLS: ${{ secrets.MOLECULE_STAGING_TENANT_URLS }}
MOLECULE_STAGING_ADMIN_TOKENS: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKENS }}
MOLECULE_STAGING_CP_BASE_URL: https://staging-api.moleculesai.app
MOLECULE_STAGING_CP_SHARED_SECRET: ${{ secrets.MOLECULE_STAGING_CP_SHARED_SECRET }}
run: |
set -euo pipefail
if [ -z "${CANARY_TENANT_URLS:-}" ] \
|| [ -z "${CANARY_ADMIN_TOKENS:-}" ] \
|| [ -z "${CANARY_CP_SHARED_SECRET:-}" ]; then
if [ -z "${MOLECULE_STAGING_TENANT_URLS:-}" ] \
|| [ -z "${MOLECULE_STAGING_ADMIN_TOKENS:-}" ] \
|| [ -z "${MOLECULE_STAGING_CP_SHARED_SECRET:-}" ]; then
{
echo "## ⚠️ canary-verify skipped"
echo "## ⚠️ staging-verify skipped"
echo
echo "One or more canary secrets are unset (\`CANARY_TENANT_URLS\`, \`CANARY_ADMIN_TOKENS\`, \`CANARY_CP_SHARED_SECRET\`)."
echo "One or more canary secrets are unset (\`MOLECULE_STAGING_TENANT_URLS\`, \`MOLECULE_STAGING_ADMIN_TOKENS\`, \`MOLECULE_STAGING_CP_SHARED_SECRET\`)."
echo "Phase 2 canary fleet has not been stood up yet —"
echo "see [canary-tenants.md](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/docs/canary-tenants.md)."
echo
echo "**Skipped — promote-to-latest will NOT auto-fire.** Dispatch \`promote-latest.yml\` manually when ready."
} >> "$GITHUB_STEP_SUMMARY"
echo "ran=false" >> "$GITHUB_OUTPUT"
echo "::notice::canary-verify: skipped — no canary fleet configured"
echo "::notice::staging-verify: skipped — no canary fleet configured"
exit 0
fi
bash scripts/canary-smoke.sh
bash scripts/staging-smoke.sh
echo "ran=true" >> "$GITHUB_OUTPUT"
- name: Summary on failure
@@ -173,7 +185,7 @@ jobs:
echo ":latest stays pinned to the prior good digest — prod is untouched."
echo
echo "Fix forward and merge again, or investigate the specific failed"
echo "assertions in the canary-smoke step log above."
echo "assertions in the staging-smoke step log above."
} >> "$GITHUB_STEP_SUMMARY"
promote-to-latest:
@@ -188,13 +200,13 @@ jobs:
# silently promoting a stale GHCR image while actual prod tenants
# pulled from ECR. Canary smoke tests were GHCR-targeted and could
# not catch a broken ECR build.
needs: canary-smoke
if: ${{ needs.canary-smoke.result == 'success' && needs.canary-smoke.outputs.smoke_ran == 'true' }}
needs: staging-smoke
if: ${{ needs.staging-smoke.result == 'success' && needs.staging-smoke.outputs.smoke_ran == 'true' }}
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
continue-on-error: true
env:
SHA: ${{ needs.canary-smoke.outputs.sha }}
SHA: ${{ needs.staging-smoke.outputs.sha }}
CP_URL: ${{ vars.CP_URL || 'https://staging-api.moleculesai.app' }}
# CP_ADMIN_API_TOKEN gates write access to the redeploy endpoint.
# Stored at the repo level so all workflows pick it up automatically.
@@ -264,9 +276,9 @@ jobs:
- name: Summary
run: |
{
echo "## Canary verified — :latest promoted via CP redeploy-fleet"
echo "## Staging verified — :latest promoted via CP redeploy-fleet"
echo ""
echo "- **Target tag:** \`staging-${{ needs.canary-smoke.outputs.sha }}\`"
echo "- **Target tag:** \`staging-${{ needs.staging-smoke.outputs.sha }}\`"
echo "- **Registry:** ECR (\`${TENANT_IMAGE_NAME}\`)"
echo "- **Canary slug:** \`${CANARY_SLUG:-<none>}\` (soak ${SOAK_SECONDS}s)"
echo "- **Batch size:** ${BATCH_SIZE:-3}"
+29 -5
View File
@@ -63,12 +63,21 @@ jobs:
sweep:
name: Sweep e2e orgs
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
continue-on-error: true
# NOTE: Phase 3 (RFC #219 §1) `continue-on-error: true` removed
# 2026-05-11. The "surface broken workflows without blocking"
# rationale was correctly applied to advisory/lint workflows but
# wrong for this janitor — silent failure here masks real-money
# tenant leaks. Hongming observed 15 leaked EC2 in molecule-canary
# (004947743811) us-east-2 at 11:05Z 2026-05-11 because the sweep
# had been exiting 2 every tick and the failure was swallowed.
# See `feedback_strict_root_only_after_class_a` — critical janitors
# must fail loud. A follow-up `notify-failure` step below also
# surfaces breakage to ops even if branch-protection wiring is
# adjusted to keep this off the required-checks list.
timeout-minutes: 15
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
MAX_AGE_MINUTES: ${{ github.event.inputs.max_age_minutes || '30' }}
DRY_RUN: ${{ github.event.inputs.dry_run || 'false' }}
# Refuse to delete more than this many orgs in one tick. If the
@@ -81,7 +90,7 @@ jobs:
- name: Verify admin token present
run: |
if [ -z "$ADMIN_TOKEN" ]; then
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN not set"
echo "::error::CP_STAGING_ADMIN_API_TOKEN not set"
exit 2
fi
echo "Admin token present ✓"
@@ -99,7 +108,8 @@ jobs:
# Filter:
# 1. slug starts with one of the ephemeral test prefixes:
# - 'e2e-' — covers e2e-canary-, e2e-canvas-*, etc.
# - 'e2e-' — covers e2e-smoke- (formerly e2e-canary-),
# e2e-canvas-*, etc.
# - 'rt-e2e-' — runtime-test harness fixtures (RFC #2251);
# missing this prefix left two such tenants
# orphaned 8h on staging (2026-05-03), then
@@ -241,3 +251,17 @@ jobs:
if: env.DRY_RUN == 'true'
run: |
echo "DRY RUN — would have deleted ${{ steps.identify.outputs.count }} org(s) AND triggered orphan-tunnels cleanup. Re-run with dry_run=false to actually delete."
- name: Notify on sweep failure
# Fail-loud companion to dropping `continue-on-error: true`.
# If any prior step failed (missing token, CP 5xx, safety-cap
# tripped, etc.) emit a clearly-tagged ::error:: line so the
# Gitea runs UI + any log-tail consumer (Loki SOPRefireRule)
# flags this. Without this step, an early `exit 2` shows as a
# red run but the message can scroll past in busy log windows;
# the explicit tag here is greppable from the orchestrator
# triage loop.
if: failure()
run: |
echo "::error::sweep-stale-e2e-orgs FAILED — staging tenants are LEAKING. See prior step logs. Common causes: (a) CP_STAGING_ADMIN_API_TOKEN secret missing/rotated, (b) staging-api.moleculesai.app 5xx, (c) safety-cap tripped (CP admin API returning malformed orgs). Manual cleanup of leaked EC2 + DNS may be required while this is broken."
exit 1
@@ -41,9 +41,10 @@ const pendingApproval = (id = "a1", workspaceId = "ws-1"): {
created_at: "2026-05-10T10:00:00Z",
});
// Shared spy reference so individual tests can call mockGet.mockRestore()
// without needing to pass it through beforeEach → it scope chain.
// Shared spy references so individual tests can reset or reject the POST mock
// without needing to call spyOn again (which would create a duplicate spy).
let mockGet: ReturnType<typeof vi.spyOn>;
let mockPost: ReturnType<typeof vi.spyOn>;
// ─── Tests ────────────────────────────────────────────────────────────────────
@@ -139,8 +140,8 @@ describe("ApprovalBanner — renders approval cards", () => {
describe("ApprovalBanner — decisions", () => {
beforeEach(() => {
vi.useFakeTimers();
vi.spyOn(api, "get").mockResolvedValueOnce([pendingApproval("a1")]);
vi.spyOn(api, "post").mockResolvedValue({});
mockGet = vi.spyOn(api, "get").mockResolvedValueOnce([pendingApproval("a1")]);
mockPost = vi.spyOn(api, "post").mockResolvedValue({});
});
afterEach(() => {
@@ -196,7 +197,7 @@ describe("ApprovalBanner — decisions", () => {
});
it("shows an error toast when POST fails", async () => {
vi.mocked(api.post).mockRejectedValueOnce(new Error("Network error"));
mockPost.mockReset().mockRejectedValue(new Error("Network error"));
render(<ApprovalBanner />);
await act(async () => { await vi.runOnlyPendingTimersAsync(); });
fireEvent.click(screen.getAllByRole("button", { name: /approve/i })[0]);
@@ -208,8 +209,9 @@ describe("ApprovalBanner — decisions", () => {
});
it("keeps the card visible when the POST fails", async () => {
// Use mockRejectedValueOnce on the same spy as beforeEach (don't call spyOn again)
vi.mocked(api.post).mockRejectedValueOnce(new Error("Network error"));
// Reset the post mock before rejecting so the beforeEach's resolved value
// is gone and we get a clean rejection instead of a resolved→rejected queue.
mockPost.mockReset().mockRejectedValue(new Error("Network error"));
render(<ApprovalBanner />);
await act(async () => { await vi.runOnlyPendingTimersAsync(); });
fireEvent.click(screen.getAllByRole("button", { name: /approve/i })[0]);
@@ -0,0 +1,291 @@
// @vitest-environment jsdom
/**
* Toolbar tests.
*
* Covers:
* - Renders with 0 workspaces
* - Shows online/offline/failed/provisioning status pills when nodes exist
* - WebSocket status pill: connected → "Live"
* - WebSocket status pill: connecting → "Reconnecting"
* - WebSocket status pill: disconnected → "Offline"
* - Stop All button visible when activeTasks > 0
* - Restart Pending button visible when needsRestart nodes exist
* - Help button opens the help popover
* - Help popover closes on Escape or pointer-outside
* - KeyboardShortcutsDialog opens via ? shortcut (when not in input)
*/
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, screen, fireEvent, cleanup } from "@testing-library/react";
import React from "react";
afterEach(() => {
cleanup();
vi.clearAllMocks();
});
// Reset store state between tests so mutations don't leak.
beforeEach(() => {
defaultStore.nodes = [];
defaultStore.wsStatus = "connected";
defaultStore.showA2AEdges = false;
defaultStore.selectedNodeId = null;
mockSetShowA2AEdges.mockClear();
mockSetPanelTab.mockClear();
mockSetSearchOpen.mockClear();
mockUpdateNodeData.mockClear();
});
// ── Mock targets ───────────────────────────────────────────────────────────────
vi.mock("@/components/Toaster", () => ({
showToast: vi.fn(),
}));
vi.mock("@/components/ConfirmDialog", () => ({
ConfirmDialog: () => null,
}));
vi.mock("@/components/settings/SettingsButton", () => ({
SettingsButton: () => null,
}));
vi.mock("@/components/settings/SettingsPanel", () => ({
settingsGearRef: { current: null },
}));
vi.mock("@/components/ThemeToggle", () => ({
ThemeToggle: () => null,
}));
vi.mock("@/components/KeyboardShortcutsDialog", () => ({
KeyboardShortcutsDialog: ({ open }: { open: boolean; onClose: () => void }) =>
open ? <div role="dialog" data-testid="shortcuts-dialog">Shortcuts</div> : null,
}));
vi.mock("@/lib/design-tokens", () => ({
statusDotClass: (status: string) => {
const map: Record<string, string> = {
online: "bg-emerald-400",
offline: "bg-zinc-500",
paused: "bg-indigo-400",
degraded: "bg-amber-400",
failed: "bg-red-400",
provisioning: "bg-sky-400",
};
return map[status] ?? "bg-zinc-500";
},
}));
vi.mock("@/lib/api", () => ({
api: {
post: vi.fn(() => Promise.resolve()),
},
}));
// ── Store mocks ────────────────────────────────────────────────────────────────
const mockSetShowA2AEdges = vi.fn();
const mockSetPanelTab = vi.fn();
const mockSetSearchOpen = vi.fn();
const mockUpdateNodeData = vi.fn();
const makeNodes = (
statuses: Array<"online" | "offline" | "failed" | "provisioning">,
activeTasks: number[] = [],
needsRestart: boolean[] = [],
parentIds: (string | null)[] = []
) => {
return statuses.map((status, i) => ({
id: `ws-${i}`,
data: {
name: `Workspace ${i}`,
role: "agent",
tier: 1,
status,
parentId: parentIds[i] ?? null,
activeTasks: activeTasks[i] ?? 0,
needsRestart: needsRestart[i] ?? false,
},
}));
};
// Nodes must be React Flow nodes (id + data), but Toolbar only reads data fields.
// makeNodes returns { id, data: { activeTasks, needsRestart, ... } }.
const toStoreNodes = (nodes: ReturnType<typeof makeNodes>) =>
nodes.map((n) => ({ id: n.id, data: n.data }));
const defaultStore = {
nodes: [] as ReturnType<typeof makeNodes>,
wsStatus: "connected" as "connected" | "connecting" | "disconnected",
showA2AEdges: false,
selectedNodeId: null as string | null,
sidePanelWidth: 480,
setShowA2AEdges: mockSetShowA2AEdges,
setPanelTab: mockSetPanelTab,
setSearchOpen: mockSetSearchOpen,
updateNodeData: mockUpdateNodeData,
selectedNodeIds: new Set<string>(),
clearSelection: vi.fn(),
batchRestart: vi.fn(() => Promise.resolve()),
batchPause: vi.fn(() => Promise.resolve()),
batchDelete: vi.fn(() => Promise.resolve()),
};
vi.mock("@/store/canvas", () => ({
useCanvasStore: vi.fn((selector: (s: typeof defaultStore) => unknown) =>
selector(defaultStore)
),
}));
// ── Component under test ───────────────────────────────────────────────────────
import { Toolbar } from "../Toolbar";
// ── Tests ─────────────────────────────────────────────────────────────────────
describe("Toolbar — workspace count display", () => {
it("shows '0 workspaces' when the canvas is empty", () => {
render(<Toolbar />);
expect(screen.getByText(/0 workspaces?/)).toBeTruthy();
});
it("shows 'N workspaces' when nodes exist", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["online", "online"]));
render(<Toolbar />);
expect(screen.getByText(/2 workspaces?/)).toBeTruthy();
});
});
describe("Toolbar — status pills", () => {
it("shows the online pill when nodes are online", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["online"]));
render(<Toolbar />);
// StatusPill uses aria-label
expect(screen.getByLabelText(/1 online/i)).toBeTruthy();
});
it("shows the offline pill only when offline nodes exist", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["offline"]));
render(<Toolbar />);
expect(screen.getByLabelText(/1 offline/i)).toBeTruthy();
});
it("shows the failed pill when failed nodes exist", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["failed"]));
render(<Toolbar />);
expect(screen.getByLabelText(/1 failed/i)).toBeTruthy();
});
it("shows the provisioning pill when provisioning nodes exist", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["provisioning"]));
render(<Toolbar />);
expect(screen.getByLabelText(/1 starting/i)).toBeTruthy();
});
it("suppresses offline pill when no offline nodes", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["online", "online"]));
render(<Toolbar />);
expect(screen.queryByLabelText(/offline/i)).toBeNull();
});
});
describe("Toolbar — WebSocket status pill", () => {
it('shows "Live" when connected', () => {
defaultStore.wsStatus = "connected";
render(<Toolbar />);
expect(screen.getByText("Live")).toBeTruthy();
});
it('shows "Reconnecting" when connecting', () => {
defaultStore.wsStatus = "connecting";
render(<Toolbar />);
expect(screen.getByText("Reconnecting")).toBeTruthy();
});
it('shows "Offline" when disconnected', () => {
defaultStore.wsStatus = "disconnected";
render(<Toolbar />);
expect(screen.getByText("Offline")).toBeTruthy();
});
});
describe("Toolbar — Stop All", () => {
it("is hidden when no active tasks", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["online"], [0]));
render(<Toolbar />);
expect(screen.queryByRole("button", { name: /Stop All/i })).toBeNull();
});
it("is visible when active tasks > 0", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["online", "online"], [2, 2]));
render(<Toolbar />);
// aria-label: "Stop all running tasks (2)"
expect(screen.getByRole("button", { name: /stop all running tasks/i })).toBeTruthy();
});
});
describe("Toolbar — Restart Pending", () => {
it("is hidden when no nodes need restart", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["online"], [], [false]));
render(<Toolbar />);
expect(screen.queryByRole("button", { name: /Restart Pending/i })).toBeNull();
});
it("is visible when nodes need restart", () => {
defaultStore.nodes = toStoreNodes(makeNodes(["online"], [], [true]));
render(<Toolbar />);
// aria-label: "Restart 1 workspaces pending config or secret changes"
expect(screen.getByRole("button", { name: /restart 1 workspace/i })).toBeTruthy();
});
});
describe("Toolbar — Help popover", () => {
it("opens when help button is clicked", () => {
render(<Toolbar />);
const helpBtn = screen.getByRole("button", { name: /open shortcuts and tips/i });
fireEvent.click(helpBtn);
expect(screen.getByRole("dialog")).toBeTruthy();
});
it("closes when close button is clicked", () => {
render(<Toolbar />);
const helpBtn = screen.getByRole("button", { name: /open shortcuts and tips/i });
fireEvent.click(helpBtn);
expect(screen.getByRole("dialog")).toBeTruthy();
const closeBtn = screen.getByRole("button", { name: /close help dialog/i });
fireEvent.click(closeBtn);
expect(screen.queryByRole("dialog")).toBeNull();
});
});
describe("Toolbar — A2A edges toggle", () => {
it("calls setShowA2AEdges on click", () => {
defaultStore.showA2AEdges = false;
render(<Toolbar />);
const toggle = screen.getByRole("button", { name: /show a2a edges/i });
fireEvent.click(toggle);
expect(mockSetShowA2AEdges).toHaveBeenCalledWith(true);
});
});
describe("Toolbar — ? shortcut opens shortcuts dialog", () => {
it("opens KeyboardShortcutsDialog when ? is pressed outside an input", () => {
render(<Toolbar />);
expect(screen.queryByTestId("shortcuts-dialog")).toBeNull();
fireEvent.keyDown(window, { key: "?" });
expect(screen.getByTestId("shortcuts-dialog")).toBeTruthy();
});
it("does not fire ? shortcut when focus is in an input", () => {
render(
<div>
<input data-testid="test-input" type="text" />
<Toolbar />
</div>
);
const input = screen.getByTestId("test-input");
fireEvent.focus(input);
// Fire on the input element so e.target.tagName === "INPUT" is true
fireEvent.keyDown(input, { key: "?" });
expect(screen.queryByTestId("shortcuts-dialog")).toBeNull();
});
});
@@ -55,10 +55,10 @@ describe("statusDotClass", () => {
describe("TIER_CONFIG", () => {
it("has entries for all four tier levels", () => {
expect(TIER_CONFIG).toHaveProperty(1);
expect(TIER_CONFIG).toHaveProperty(2);
expect(TIER_CONFIG).toHaveProperty(3);
expect(TIER_CONFIG).toHaveProperty(4);
expect(TIER_CONFIG).toHaveProperty("1");
expect(TIER_CONFIG).toHaveProperty("2");
expect(TIER_CONFIG).toHaveProperty("3");
expect(TIER_CONFIG).toHaveProperty("4");
});
it("each tier has label, color, and border fields", () => {
+5 -5
View File
@@ -2,7 +2,7 @@
How a workspace-server code change reaches the prod tenant fleet — and how to stop it if something's wrong.
> **⚠️ State note (2026-04-22):** this doc describes the **intended design**. As of this write, the canary fleet described below is **not actually running** — no canary tenants are provisioned, `CANARY_TENANT_URLS` / `CANARY_ADMIN_TOKENS` / `CANARY_CP_SHARED_SECRET` are empty in repo secrets, and `canary-verify.yml` fails every run.
> **⚠️ State note (2026-04-22, secret names refreshed 2026-05-11):** this doc describes the **intended design**. As of this write, the canary fleet described below is **not actually running** — no canary tenants are provisioned, `MOLECULE_STAGING_TENANT_URLS` / `MOLECULE_STAGING_ADMIN_TOKENS` / `MOLECULE_STAGING_CP_SHARED_SECRET` are empty in repo secrets, and `staging-verify.yml` (formerly `canary-verify.yml`) fails every run.
>
> Current merges gate on manual `promote-latest.yml` dispatches, not canary. See [molecule-controlplane/docs/canary-tenants.md](https://git.moleculesai.app/molecule-ai/molecule-controlplane/src/branch/main/docs/canary-tenants.md) for the Phase 1 code work that's already shipped + the Phase 2 plan for actually standing up the fleet + a "should we even do this now?" decision framework.
>
@@ -22,7 +22,7 @@ publish-workspace-server-image.yml ← pushes :staging-<sha> ONLY
Canary tenants auto-update to :staging-<sha>
│ (5-min auto-updater cycle on each canary EC2)
canary-verify.yml waits 6 min, runs scripts/canary-smoke.sh
staging-verify.yml waits 6 min, runs scripts/staging-smoke.sh
├─► GREEN → crane tag :staging-<sha> → :latest
│ │
@@ -42,7 +42,7 @@ Canary tenants are configured to pull `:staging-<sha>` (not `:latest`) via `TENA
## Smoke suite
`scripts/canary-smoke.sh` hits each canary tenant (URL + ADMIN_TOKEN pair) and asserts:
`scripts/staging-smoke.sh` hits each canary tenant (URL + ADMIN_TOKEN pair) and asserts:
- `/admin/liveness` returns a subsystems map (tenant booted, AdminAuth reachable)
- `/workspaces` returns a JSON array (wsAuth + DB healthy)
@@ -59,8 +59,8 @@ Expand by editing the script — each `check "name" "expected" "$response"` call
3. Re-trigger provision (or delete + recreate if the org was already provisioned into staging) — the fresh EC2 lands in the canary AWS account (see internal runbook for the specific ID)
Then set repo secrets:
- `CANARY_TENANT_URLS` — append the new tenant's URL
- `CANARY_ADMIN_TOKENS` — append its ADMIN_TOKEN in the same position
- `MOLECULE_STAGING_TENANT_URLS` — append the new tenant's URL
- `MOLECULE_STAGING_ADMIN_TOKENS` — append its ADMIN_TOKEN in the same position
## Rolling back `:latest`
@@ -50,7 +50,7 @@ pipeline.
| `check-merge-group-trigger.yml` | The workflow's own header (lines 18-23) documents that it's vacuously satisfied on Gitea — Gitea has no merge queue, no `merge_group:` event type, no `gh-readonly-queue/...` refs. Nothing to lint. |
| `codeql.yml` | The workflow's own header (lines 3-67) documents that `github/codeql-action/init@v4` hits api.github.com bundle endpoints not implemented by Gitea (observed: `::error::404 page not found` in Initialize CodeQL step). Per Hongming decision 2026-05-07 (task #156): CodeQL is ADVISORY/non-blocking until a Gitea-compatible SAST pipeline lands. Replacement options (Semgrep self-host, Sonatype, GitHub-mirror-for-SAST) tracked in #156. |
| `pr-guards.yml` | The workflow's own header documents that Gitea has no `gh pr merge --auto` primitive — the guard is a structural no-op on Gitea. Branch protection on `main` does NOT reference any `pr-guards` check name; deletion is safe. |
| `promote-latest.yml` | Uses `imjasonh/setup-crane` against `ghcr.io/molecule-ai/platform` — the GHCR registry was retired during the 2026-05-06 Gitea migration (per `canary-verify.yml` header notes, the canonical tenant image moved to ECR `153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant`). The workflow can no longer find any image to retag. Follow-up issue suggested if an ECR-based retag promote is desired. |
| `promote-latest.yml` | Uses `imjasonh/setup-crane` against `ghcr.io/molecule-ai/platform` — the GHCR registry was retired during the 2026-05-06 Gitea migration (per `staging-verify.yml` header notes — file was renamed from `canary-verify.yml` on 2026-05-11; the canonical tenant image moved to ECR `153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/platform-tenant`). The workflow can no longer find any image to retag. Follow-up issue suggested if an ECR-based retag promote is desired. |
## Category C — ported to .gitea/
+150
View File
@@ -0,0 +1,150 @@
# Gitea Actions operational quirks (molecule-core)
Documents persistent operational findings about Gitea Actions runner behaviour
that differ from GitHub Actions and require workarounds in workflow YAML or
runbooks.
> Last updated: 2026-05-11 (core-devops-agent)
---
## Gitea 1.22.6 runner network isolation
### Finding
The Gitea Actions runner (container on host `5.78.80.188`) cannot reach the
git remote (`https://git.moleculesai.app`) over HTTPS from inside the runner
container. Any `git fetch`, `git clone`, or `git push` command that contacts
the remote times out at 1215 s.
This is **not a Gitea Actions bug** — it is an operator-level network policy
where the runner container's network namespace is restricted from reaching the
Gitea host HTTPS endpoint. The runner can reach external hosts (GitHub,
Docker Hub, PyPI) normally.
### Impact
Workflows that rely on `git fetch origin <ref>` or `actions/checkout` with
`fetch-depth: 0` (full history) will hang or time out.
Specifically:
- `actions/checkout@v*` with `fetch-depth: 0` hangs (fetching full repo
history takes >30 s before hitting the timeout).
- `git fetch origin main --depth=1` times out at ~15 s.
- `git clone <url>` times out at ~15 s.
### Affected workflows
| Workflow | Issue | Workaround |
|---|---|---|
| `harness-replays.yml` detect-changes job | `git fetch origin main --depth=1` times out | Added `timeout 20` + graceful fallback to `run=true` (always run harness) per PR #441 |
| `publish-workspace-server-image.yml` | In-image `git clone` of workspace templates | Pre-clone manifest deps before compose build (Task #173 pattern) |
| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + explicit `git fetch` for needed refs |
### How to diagnose
```bash
# From inside the runner (add as a debug step):
timeout 20 git fetch origin main --depth=1
# If this times out: runner cannot reach git remote
```
### Verification
Confirmed 2026-05-11 by running `timeout 20 git fetch origin main --depth=1`
in the `detect-changes` job of `harness-replays.yml` — consistently times
out at 15 s. Runner can reach `https://api.github.com` and `https://pypi.org`
without issue.
### References
- PR #441: fix for `harness-replays.yml` detect-changes
- Task #173: pre-clone manifest deps pattern for compose build
- internal#102: tracking customer-private + marketplace third-party repos
- `feedback_oss_first_repo_visibility_default`: 5 workspace-template repos
flipped public to allow pre-clone without auth
---
## `continue-on-error` only works at step level, not job level
### Finding
Gitea Actions (1.22.6) does not honour `continue-on-error: true` at the **job**
level the way GitHub Actions does. A job with `continue-on-error: true` that
fails still reports `status: failure` in the commit status API.
Only `continue-on-error: true` at the **step** level works as expected.
### Impact
If you want a job to always "pass" in the status API (so dependent jobs can
run and the overall CI does not show `failure`), you must add
`continue-on-error: true` to every step that can fail, AND ensure each step
exits with code 0 (e.g., append `|| true` to commands that might fail).
### Affected workflows
| Workflow | Fix |
|---|---|
| `harness-replays.yml` detect-changes | Added `continue-on-error: true` to fetch step + decide step; added `|| true` to `DIFF=$(git diff ...)` per PR #441 |
### How to diagnose
```yaml
# WRONG — job reports as failure despite flag
jobs:
my-job:
continue-on-error: true # ← ignored by Gitea
steps:
- run: git diff ... # ← if this fails, job = failure
# job-level flag does not help
# RIGHT — step-level flag prevents step from failing
jobs:
my-job:
steps:
- run: git diff ... || true # ← step exits 0
continue-on-error: true # ← belt and suspenders
```
### References
- Gitea Actions quirk #10 (from migration checklist)
- PR #441: fix applied to `harness-replays.yml`
---
## `workflow_dispatch.inputs` not supported
Gitea 1.22.6 parser rejects `workflow_dispatch.inputs`. Drop from all workflow
YAML files ported from GitHub Actions. Manual triggers should use
`workflow_dispatch` without `inputs:`.
**Reference**: `feedback_gitea_workflow_dispatch_inputs_unsupported`
---
## `merge_group` not supported
Gitea has no merge queue concept. Drop `merge_group:` triggers from all
workflow YAML files.
---
## `environment:` blocks not supported
Gitea has no environments concept. Drop `environment:` from all workflow YAML
files. Secrets and variables are repo-level.
---
## `fetch-depth: 0` on `actions/checkout` times out
`actions/checkout` with `fetch-depth: 0` triggers a full repo history fetch
which exceeds the runner's network timeout to the git remote (~15 s).
**Workaround**: Use `fetch-depth: 1` (default) and add explicit
`git fetch origin <ref> --depth=1` for any additional refs needed.
**Reference**: PR #441 detect-changes fetch step.
+1 -1
View File
@@ -43,7 +43,7 @@ endpoint handler for the supported range.
- `cleanup-rogue-workspaces.sh` — emergency teardown for leaked
workspaces. Prompts for confirmation. Pair with the harnesses if a
cleanup trap fails (see `cleanup_*_failed` events).
- `canary-smoke.sh` — quick smoke test for canary releases.
- `staging-smoke.sh` — quick smoke test for the staging canary fleet (formerly `canary-smoke.sh`).
- `dev-start.sh` — local-dev platform bring-up.
The rest are self-documenting in their header comments.
@@ -1,29 +1,40 @@
#!/bin/bash
# canary-smoke.sh — runs the post-deploy smoke suite against the
# staging canary tenant fleet. Called by the canary-verify.yml GitHub
# staging-smoke.sh — runs the post-deploy smoke suite against the
# staging canary tenant fleet. Called by the staging-verify.yml Gitea
# Actions workflow after a new workspace-server image lands in ECR;
# exits non-zero on any failure so the workflow can block the
# redeploy-fleet promotion that would otherwise release broken code
# to the prod tenant fleet.
#
# Naming note (2026-05-11): The script (and its input env vars) were
# renamed from canary-smoke.sh / CANARY_* to staging-smoke.sh /
# MOLECULE_STAGING_* per Hongming directive. The tested COHORT is still
# called the "canary fleet" (a small subset of staging tenants that
# ingest :staging-<sha> before the rest of the fleet); that strategy
# concept is unchanged.
#
# Registry note: GHCR was retired 2026-05-06. Images are now pushed
# to the operator's ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/
# molecule-ai/platform-tenant). The registry URL is a runtime concern for
# the CI push step; this script tests the running tenant directly.
#
# Environment:
# CANARY_TENANT_URLS space-sep list of canary tenant base URLs
# (e.g. "https://canary-pm.staging.moleculesai.app
# https://canary-mcp.staging.moleculesai.app")
# CANARY_ADMIN_TOKENS space-sep list of ADMIN_TOKENs, positionally
# matched to CANARY_TENANT_URLS. Canary tenants
# are provisioned with known ADMIN_TOKENs so CI
# can hit their admin-gated endpoints.
# CANARY_CP_BASE_URL CP base URL the canaries call back to
# (https://staging-api.moleculesai.app)
# CANARY_CP_SHARED_SECRET matches CP's PROVISION_SHARED_SECRET so this
# script can also exercise /cp/workspaces/* via
# the canary's own CPProvisioner identity.
# MOLECULE_STAGING_TENANT_URLS space-sep list of canary tenant base
# URLs (e.g. "https://canary-pm.staging.
# moleculesai.app https://canary-mcp.
# staging.moleculesai.app")
# MOLECULE_STAGING_ADMIN_TOKENS space-sep list of ADMIN_TOKENs,
# positionally matched to
# MOLECULE_STAGING_TENANT_URLS.
# Canary tenants are provisioned with
# known ADMIN_TOKENs so CI can hit
# their admin-gated endpoints.
# MOLECULE_STAGING_CP_BASE_URL CP base URL the canaries call back to
# (https://staging-api.moleculesai.app)
# MOLECULE_STAGING_CP_SHARED_SECRET matches CP's PROVISION_SHARED_SECRET
# so this script can also exercise
# /cp/workspaces/* via the canary's
# own CPProvisioner identity.
#
# Exit codes: 0 = all green, 1 = assertion failure, 2 = setup/env problem.
@@ -31,12 +42,12 @@ set -euo pipefail
# ── Setup ────────────────────────────────────────────────────────────────
: "${CANARY_TENANT_URLS:?space-sep list of canary base URLs required}"
: "${CANARY_ADMIN_TOKENS:?space-sep list of ADMIN_TOKENs required, same order as URLs}"
: "${CANARY_CP_BASE_URL:?CP base URL required}"
: "${MOLECULE_STAGING_TENANT_URLS:?space-sep list of canary base URLs required}"
: "${MOLECULE_STAGING_ADMIN_TOKENS:?space-sep list of ADMIN_TOKENs required, same order as URLs}"
: "${MOLECULE_STAGING_CP_BASE_URL:?CP base URL required}"
read -r -a URLS <<< "$CANARY_TENANT_URLS"
read -r -a TOKENS <<< "$CANARY_ADMIN_TOKENS"
read -r -a URLS <<< "$MOLECULE_STAGING_TENANT_URLS"
read -r -a TOKENS <<< "$MOLECULE_STAGING_ADMIN_TOKENS"
if [ "${#URLS[@]}" -ne "${#TOKENS[@]}" ]; then
echo "ERROR: URLS(${#URLS[@]}) and TOKENS(${#TOKENS[@]}) length mismatch" >&2
@@ -69,7 +80,7 @@ check() {
# tenant never gets the wrong token.
acurl() {
local base="$1" token="$2"; shift 2
curl -sS --max-time 20 -H "Authorization: Bearer $token" "$@" -- "$base${CANARY_ACURL_PATH:-}"
curl -sS --max-time 20 -H "Authorization: Bearer $token" "$@" -- "$base${ACURL_PATH:-}"
}
# ── Checks (run per canary tenant) ───────────────────────────────────────
@@ -80,7 +91,7 @@ for i in "${!URLS[@]}"; do
printf "\n── %s ──\n" "$base"
# 1. Liveness — the tenant is up and responding to admin auth.
CANARY_ACURL_PATH="/admin/liveness" resp=$(acurl "$base" "$token" || true)
ACURL_PATH="/admin/liveness" resp=$(acurl "$base" "$token" || true)
check "liveness returns a subsystems map" '"subsystems"' "$resp"
# 2. CP env refresh — the workspace-server fetched MOLECULE_CP_SHARED_SECRET
@@ -89,25 +100,25 @@ for i in "${!URLS[@]}"; do
# booted without crashing on the refresh call. A startup failure in
# refreshEnvFromCP logs but still boots (best-effort semantics), so
# this is a sanity check, not a proof.
CANARY_ACURL_PATH="/workspaces" resp=$(acurl "$base" "$token" || true)
ACURL_PATH="/workspaces" resp=$(acurl "$base" "$token" || true)
check "workspace list is JSON array" "[" "$resp"
# 3. Memory commit round-trip — scope=LOCAL so test data stays on this
# tenant. Verifies encryption + scrubber + retrieval end-to-end.
probe_id="canary-smoke-$(date +%s)-$i"
body=$(printf '{"scope":"LOCAL","namespace":"canary-smoke","content":"probe-%s"}' "$probe_id")
CANARY_ACURL_PATH="/memories/commit" resp=$(curl -sS --max-time 20 \
ACURL_PATH="/memories/commit" resp=$(curl -sS --max-time 20 \
-X POST -H "Content-Type: application/json" -H "Authorization: Bearer $token" \
--data "$body" "$base/memories/commit" || true)
check "memory commit accepted" '"id"' "$resp"
CANARY_ACURL_PATH="/memories/search?query=probe-${probe_id}" \
ACURL_PATH="/memories/search?query=probe-${probe_id}" \
resp=$(curl -sS --max-time 20 -H "Authorization: Bearer $token" \
"$base/memories/search?query=probe-${probe_id}" || true)
check "memory search finds the probe" "probe-${probe_id}" "$resp"
# 4. Events admin read — AdminAuth path (C4 fail-closed proof on SaaS).
CANARY_ACURL_PATH="/events" resp=$(acurl "$base" "$token" || true)
ACURL_PATH="/events" resp=$(acurl "$base" "$token" || true)
check "events endpoint returns JSON" "[" "$resp"
# 5. Negative: unauth'd admin call must 401 (C4 regression gate).
@@ -117,7 +128,7 @@ for i in "${!URLS[@]}"; do
# 6. POST /org/import unauth → 401. Proves the route is compiled in
# and AdminAuth is enforced. A missing route returns 404 (the failure
# mode caught by issue #213). Regression guard for the silent-GHCR-
# migration gap: canary-verify was testing a stale GHCR image while
# migration gap: staging-verify (formerly canary-verify) was testing a stale GHCR image while
# actual tenants ran ECR — this test would have caught a missing-route
# binary before it reached prod.
unauth_code=$(curl -sS -o /dev/null -w '%{http_code}' \
+12 -4
View File
@@ -7,11 +7,11 @@ Four workflows + a shared bash harness that together cover the SaaS stack end to
| Workflow | Cadence | Wall time | Scope |
|---|---|---|---|
| `e2e-staging-saas.yml` | push + nightly 07:00 UTC | ~20 min | Full API: org → tenant → 2 workspaces → A2A → HMA → delegation → leak check |
| `canary-staging.yml` | every 30 min | ~8 min | Minimum smoke + self-managed alert issue |
| `staging-smoke.yml` | every 30 min | ~8 min | Minimum smoke + self-managed alert issue |
| `e2e-staging-canvas.yml` | push + weekly Sunday 08:00 | ~25 min | All 13 canvas workspace-panel tabs via Playwright |
| `e2e-staging-sanity.yml` | weekly Monday 06:00 | ~10 min | Intentional-failure: teardown safety-net self-check |
`tests/e2e/test_staging_full_saas.sh` is the shared harness all workflows invoke (with `E2E_MODE={full|canary}` and `E2E_INTENTIONAL_FAILURE={0|1}` toggles).
`tests/e2e/test_staging_full_saas.sh` is the shared harness all workflows invoke (with `E2E_MODE={full|smoke}` and `E2E_INTENTIONAL_FAILURE={0|1}` toggles).
### Full-SaaS checklist (sections)
@@ -49,7 +49,15 @@ Runs the harness with `E2E_INTENTIONAL_FAILURE=1`, which poisons the tenant admi
Set in **Settings → Secrets and variables → Actions → Repository secrets**:
### `MOLECULE_STAGING_ADMIN_TOKEN`
### `CP_STAGING_ADMIN_API_TOKEN`
> **Historical-rename note (2026-05-11):** previously named
> `MOLECULE_STAGING_ADMIN_TOKEN`. Canonicalised to
> `CP_STAGING_ADMIN_API_TOKEN` per internal#322 (the Railway staging
> service exposes it as `CP_ADMIN_API_TOKEN`; the `CP_*` repo-secret
> prefix matches the upstream env name + makes the service it talks
> to obvious in workflow YAMLs). See the original PR for the
> cross-workflow sweep.
The `CP_ADMIN_API_TOKEN` env currently set on the Railway staging molecule-platform → controlplane service.
@@ -82,7 +90,7 @@ bash tests/e2e/test_staging_full_saas.sh
## Cost
- Full run: ~20 min, ~$0.007
- Canary (48/day): ~$0.06/day
- Smoke (48/day): ~$0.06/day
- Canvas (few/week): ~$0.01/day
- Sanity (weekly): ~$0.002/week
- **Total staging burn: < $0.15/day** at expected CI load
+18 -6
View File
@@ -27,7 +27,11 @@
# E2E_PROVISION_TIMEOUT_SECS default 900 (15 min cold EC2 budget)
# E2E_KEEP_ORG 1 → skip teardown (debugging only)
# E2E_RUN_ID Slug suffix; CI: ${GITHUB_RUN_ID}
# E2E_MODE full (default) | canary
# E2E_MODE full (default) | smoke
# (legacy alias `canary` still accepted —
# mapped to `smoke` for back-compat with
# any in-flight runner picking up an older
# workflow checkout)
# E2E_INTENTIONAL_FAILURE 1 → poison tenant token mid-run so the
# script fails; the EXIT trap MUST still
# tear down cleanly (and exit 4 on leak).
@@ -49,15 +53,23 @@ RUNTIME="${E2E_RUNTIME:-hermes}"
PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-900}"
RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
MODE="${E2E_MODE:-full}"
# `canary` is a legacy alias for `smoke` retained for back-compat with
# any in-flight runner picking up an older workflow checkout during the
# 2026-05-11 canary→staging rename rollout. Both map to the same slug
# prefix below. Remove the `canary` alias after one week of no-old-mode
# observations.
if [ "$MODE" = "canary" ]; then
MODE="smoke"
fi
case "$MODE" in
full|canary) ;;
*) echo "E2E_MODE must be 'full' or 'canary' (got: $MODE)" >&2; exit 2 ;;
full|smoke) ;;
*) echo "E2E_MODE must be 'full' or 'smoke' (got: $MODE)" >&2; exit 2 ;;
esac
# Canary runs get a distinct prefix so their safety-net sweeper only
# Smoke runs get a distinct slug prefix so their safety-net sweeper only
# touches their own runs, not in-flight full runs.
if [ "$MODE" = "canary" ]; then
SLUG="e2e-canary-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
if [ "$MODE" = "smoke" ]; then
SLUG="e2e-smoke-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
else
SLUG="e2e-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
fi
+4 -4
View File
@@ -25,10 +25,10 @@ _WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
if not _WORKSPACE_ID_raw:
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
WORKSPACE_ID = _WORKSPACE_ID_raw
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
else:
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://localhost:8080")
# Platform URL: always host.docker.internal inside containers. The platform API
# is only reachable via the Docker network mesh from inside a workspace
# container regardless of the runtime environment (Docker/host).
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
async def discover(target_id: str) -> dict | None:
+4 -4
View File
@@ -26,10 +26,10 @@ _WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
if not _WORKSPACE_ID_raw:
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
WORKSPACE_ID = _WORKSPACE_ID_raw
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
else:
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://localhost:8080")
# Platform URL: always host.docker.internal inside containers. The platform API
# is only reachable via the Docker network mesh from inside a workspace
# container regardless of the runtime environment (Docker/host).
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
# Cache workspace ID → name mappings (populated by list_peers calls)
_peer_names: dict[str, str] = {}
+16 -4
View File
@@ -54,6 +54,18 @@ import httpx
logger = logging.getLogger(__name__)
def _platform_url() -> str:
"""Return the platform URL, defaulting to host.docker.internal.
The workspace runtime always runs inside a Docker container, so
``localhost`` refers to the container itself, not the platform host.
The platform API is only reachable via ``host.docker.internal`` from
within a workspace container, regardless of how the container was started.
"""
return os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
# ─────────────────────────────────────────────────────────────────────────────
# Constants
# ─────────────────────────────────────────────────────────────────────────────
@@ -79,12 +91,12 @@ async def _fetch_latest_checkpoint(workspace_id: str) -> Optional[dict]:
workspace_id: The workspace to query.
Reads:
PLATFORM_URL Platform base URL (default ``http://localhost:8080``).
PLATFORM_URL Platform base URL (default ``http://host.docker.internal:8080``).
"""
try:
from platform_auth import auth_headers as _auth_headers # type: ignore[import]
platform_url = os.environ.get("PLATFORM_URL", "http://localhost:8080")
platform_url = _platform_url()
url = f"{platform_url}/workspaces/{workspace_id}/checkpoints/latest"
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(url, headers=_auth_headers())
@@ -125,12 +137,12 @@ async def _save_checkpoint(
payload: Optional JSON-serialisable dict stored as JSONB.
Reads:
PLATFORM_URL Platform base URL (default ``http://localhost:8080``).
PLATFORM_URL Platform base URL (default ``http://host.docker.internal:8080``).
"""
try:
from platform_auth import auth_headers as _auth_headers # type: ignore[import]
platform_url = os.environ.get("PLATFORM_URL", "http://localhost:8080")
platform_url = _platform_url()
url = f"{platform_url}/workspaces/{workspace_id}/checkpoints"
body: dict = {
"workflow_id": workflow_id,
+30 -14
View File
@@ -48,6 +48,27 @@ def get_machine_ip() -> str: # pragma: no cover
return "127.0.0.1"
def _check_delegation_results_pending() -> bool:
"""Check if there are unconsumed delegation results waiting.
Reads ``DELEGATION_RESULTS_FILE``. Returns ``True`` if the file
exists and contains non-whitespace content (after stripping) — meaning
the idle loop should skip this tick. Returns ``False`` if the file is
absent, empty, or contains only whitespace.
The extracted form lets unit tests call this directly rather than mirroring
the logic (anti-pattern flagged as #401).
"""
from heartbeat import DELEGATION_RESULTS_FILE
try:
with open(DELEGATION_RESULTS_FILE) as rf:
rf.seek(0)
return bool(rf.read().strip())
except FileNotFoundError:
return False
# Re-exported from transcript_auth for the inline /transcript handler.
# Separate module keeps the security-critical gate import-light + unit-testable.
from transcript_auth import transcript_authorized as _transcript_authorized
@@ -678,20 +699,15 @@ async def main(): # pragma: no cover
# heartbeat's own self-message wake the agent after results are
# written. The agent then sees the results in _prepare_prompt()
# and processes them before composing.
from heartbeat import DELEGATION_RESULTS_FILE as _DRF
try:
with open(_DRF) as _rf:
_rf.seek(0)
_content = _rf.read().strip()
if _content:
print(
f"Idle loop: skipping — {len(_content)} bytes of unconsumed "
f"delegation results pending (heartbeat will notify agent)",
flush=True,
)
continue
except FileNotFoundError:
pass # No results file — normal, proceed with idle prompt
# Guard logic extracted to _check_delegation_results_pending() for
# direct unit-testing (#401 follow-up).
if _check_delegation_results_pending():
print(
"Idle loop: skipping — unconsumed delegation results pending "
"(heartbeat will notify agent)",
flush=True,
)
continue
# Self-post the idle prompt via the platform A2A proxy (same
# path as initial_prompt). The agent's own concurrency control
+3 -102
View File
@@ -279,7 +279,7 @@ class TestToolDelegateTask:
patch("a2a_tools.report_activity", new=AsyncMock()):
result = await a2a_tools.tool_delegate_task("ws-1", "do something")
assert result == "Task completed!"
assert result == "[A2A_RESULT_FROM_PEER]\nTask completed!\n[/A2A_RESULT_FROM_PEER]"
async def test_error_response_returns_delegation_failed_message(self):
"""When send_a2a_message returns _A2A_ERROR_PREFIX text, delegation fails."""
@@ -307,7 +307,7 @@ class TestToolDelegateTask:
patch("a2a_tools.report_activity", new=AsyncMock()):
result = await a2a_tools.tool_delegate_task("ws-cached", "task")
assert result == "done"
assert result == "[A2A_RESULT_FROM_PEER]\ndone\n[/A2A_RESULT_FROM_PEER]"
async def test_peer_name_falls_back_to_id_prefix(self):
"""When peer has no name and cache is empty, name = first 8 chars of workspace_id."""
@@ -321,110 +321,11 @@ class TestToolDelegateTask:
patch("a2a_tools.report_activity", new=AsyncMock()):
result = await a2a_tools.tool_delegate_task("ws-nona000", "task")
assert result == "ok"
assert result == "[A2A_RESULT_FROM_PEER]\nok\n[/A2A_RESULT_FROM_PEER]"
# Cache should now have been set
assert a2a_tools._peer_names.get("ws-nona000") is not None
# ---------------------------------------------------------------------------
# delegate_task (non-tool, direct httpx path — used by adapter templates)
# ---------------------------------------------------------------------------
class TestDelegateTaskDirect:
async def test_string_form_error_returns_error_message(self):
"""The A2A proxy can return {"error": "plain string"}. Must not raise
AttributeError: 'str' object has no attribute 'get'."""
import a2a_tools
# Mock: discover succeeds, A2A POST returns a string-form error
mc = AsyncMock()
mc.__aenter__ = AsyncMock(return_value=mc)
mc.__aexit__ = AsyncMock(return_value=False)
async def fake_post(url, **kwargs):
r = MagicMock()
r.status_code = 200
r.json = MagicMock(return_value={"error": "peer workspace unreachable"})
return r
async def fake_get(url, **kwargs):
r = MagicMock()
r.status_code = 200
r.json = MagicMock(return_value={"url": "http://peer.svc/a2a"})
return r
mc.post = fake_post
mc.get = fake_get
with patch("a2a_tools.httpx.AsyncClient", return_value=mc):
result = await a2a_tools.delegate_task("ws-peer-123", "do a thing")
assert "Error" in result
assert "peer workspace unreachable" in result
async def test_dict_form_error_returns_error_message(self):
"""{"error": {"message": "...", "code": ...}} — the pre-existing path."""
import a2a_tools
mc = AsyncMock()
mc.__aenter__ = AsyncMock(return_value=mc)
mc.__aexit__ = AsyncMock(return_value=False)
async def fake_post(url, **kwargs):
r = MagicMock()
r.status_code = 200
r.json = MagicMock(return_value={"error": {"message": "internal server error", "code": 500}})
return r
async def fake_get(url, **kwargs):
r = MagicMock()
r.status_code = 200
r.json = MagicMock(return_value={"url": "http://peer.svc/a2a"})
return r
mc.post = fake_post
mc.get = fake_get
with patch("a2a_tools.httpx.AsyncClient", return_value=mc):
result = await a2a_tools.delegate_task("ws-peer-456", "do a thing")
assert "Error" in result
assert "internal server error" in result
async def test_success_returns_result_text(self):
"""Happy path: result with parts returns the first text part."""
import a2a_tools
mc = AsyncMock()
mc.__aenter__ = AsyncMock(return_value=mc)
mc.__aexit__ = AsyncMock(return_value=False)
async def fake_post(url, **kwargs):
r = MagicMock()
r.status_code = 200
r.json = MagicMock(return_value={
"result": {
"parts": [{"kind": "text", "text": "Task done!"}]
}
})
return r
async def fake_get(url, **kwargs):
r = MagicMock()
r.status_code = 200
r.json = MagicMock(return_value={"url": "http://peer.svc/a2a"})
return r
mc.post = fake_post
mc.get = fake_get
with patch("a2a_tools.httpx.AsyncClient", return_value=mc):
result = await a2a_tools.delegate_task("ws-peer-789", "do a thing")
assert result == "Task done!"
# ---------------------------------------------------------------------------
# tool_delegate_task_async
# ---------------------------------------------------------------------------
+50 -45
View File
@@ -4,77 +4,82 @@ The idle loop skips sending the idle prompt when DELEGATION_RESULTS_FILE
contains unconsumed results, preventing the agent from composing a stale tick
before processing pending delegation notifications from the heartbeat.
Source: workspace/main.py:_run_idle_loop() pending-results guard.
Source: ``workspace/main.py:_check_delegation_results_pending()`` (extracted from
``_run_idle_loop()`` guard; see PR #432 follow-up).
The guard is extracted into a module-level function so unit tests call the
real production logic directly — not a mirror copy. This avoids the
test-mirror anti-pattern (issue #401) where a copied implementation
drifts from the production code it is supposed to test.
"""
from __future__ import annotations
import io
import json
from unittest.mock import patch
import pytest
def check_results_pending(file_path: str) -> bool:
"""Mirror the guard logic from workspace/main.py:_run_idle_loop().
Returns True if the results file exists and is non-empty,
meaning the idle loop should skip this tick.
"""
try:
with open(file_path) as rf:
rf.seek(0)
content = rf.read().strip()
return bool(content)
except FileNotFoundError:
return False
from main import _check_delegation_results_pending
class TestIdleLoopPendingCheck:
"""Tests for the idle-loop pending-delegation-results guard."""
"""Tests for the idle-loop pending-delegation-results guard.
def test_no_file_means_proceed(self, tmp_path):
Each test patches ``builtins.open`` so ``_check_delegation_results_pending``
reads the controlled payload instead of the real DELEGATION_RESULTS_FILE.
No filesystem side-effects.
"""
def _patch_open(self, payload: str | None):
"""Patch builtins.open for _check_delegation_results_pending.
Args:
payload: file contents to return. None → FileNotFoundError.
"""
if payload is None:
return patch("builtins.open", side_effect=FileNotFoundError)
else:
fake_file = io.StringIO(payload)
return patch("builtins.open", return_value=fake_file)
def test_no_file_means_proceed(self):
"""No delegation results file → idle loop fires normally."""
results_file = tmp_path / "delegation_results.jsonl"
assert not check_results_pending(str(results_file))
with self._patch_open(None):
assert _check_delegation_results_pending() is False
def test_empty_file_means_proceed(self, tmp_path):
def test_empty_file_means_proceed(self):
"""Empty file → no pending results → idle loop fires."""
results_file = tmp_path / "delegation_results.jsonl"
results_file.write_text("", encoding="utf-8")
assert not check_results_pending(str(results_file))
with self._patch_open(""):
assert _check_delegation_results_pending() is False
def test_whitespace_only_file_means_proceed(self, tmp_path):
def test_whitespace_only_file_means_proceed(self):
"""File with only whitespace → treated as empty → idle loop fires."""
results_file = tmp_path / "delegation_results.jsonl"
results_file.write_text(" \n ", encoding="utf-8")
assert not check_results_pending(str(results_file))
with self._patch_open(" \n "):
assert _check_delegation_results_pending() is False
def test_single_result_means_skip(self, tmp_path):
def test_single_result_means_skip(self):
"""File with one delegation result → skip idle tick."""
results_file = tmp_path / "delegation_results.jsonl"
results_file.write_text(
payload = (
json.dumps({
"status": "completed",
"delegation_id": "del-abc",
"summary": "Done",
}) + "\n",
encoding="utf-8",
}) + "\n"
)
assert check_results_pending(str(results_file))
with self._patch_open(payload):
assert _check_delegation_results_pending() is True
def test_multiple_results_means_skip(self, tmp_path):
def test_multiple_results_means_skip(self):
"""File with multiple delegation results → skip idle tick."""
results_file = tmp_path / "delegation_results.jsonl"
results_file.write_text(
payload = (
json.dumps({"status": "completed", "delegation_id": "del-1", "summary": "A"})
+ "\n"
+ json.dumps({"status": "failed", "delegation_id": "del-2", "summary": "B"})
+ "\n",
encoding="utf-8",
+ "\n"
)
assert check_results_pending(str(results_file))
with self._patch_open(payload):
assert _check_delegation_results_pending() is True
def test_file_with_only_newline_means_proceed(self, tmp_path):
def test_file_with_only_newline_means_proceed(self):
"""File with only a newline character → stripped to empty → fires."""
results_file = tmp_path / "delegation_results.jsonl"
results_file.write_text("\n", encoding="utf-8")
assert not check_results_pending(str(results_file))
with self._patch_open("\n"):
assert _check_delegation_results_pending() is False