Compare commits

..

23 Commits

Author SHA1 Message Date
Molecule AI Dev Engineer A (Kimi) 48b6011e17 fix(2047): pass workspaceID to stripPluginMarkersFromMemory
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Harness Replays / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
gate-check-v3 / gate-check (pull_request_target) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
security-review / approved (pull_request_target) Failing after 5s
qa-review / approved (pull_request_target) Failing after 6s
E2E Chat / detect-changes (pull_request) Successful in 16s
CI / Canvas (Next.js) (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Canvas Deploy Status (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
Harness Replays / Harness Replays (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 54s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m32s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m49s
CI / Platform (Go) (pull_request) Successful in 3m54s
CI / all-required (pull_request) Successful in 7s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-tier-check / tier-check (pull_request_target) Successful in 5s
sop-checklist / all-items-acked (pull_request_target) Successful in 8s
2026-06-05 04:09:20 +00:00
Molecule AI Dev Engineer A (Kimi) cc99d3fff4 fix(plugins): log silently ignored execAsRoot errors during uninstall
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request_target) Successful in 3s
Harness Replays / detect-changes (pull_request) Successful in 17s
qa-review / approved (pull_request_target) Failing after 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
security-review / approved (pull_request_target) Failing after 8s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 25s
CI / Platform (Go) (pull_request) Failing after 36s
CI / all-required (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Successful in 2s
CI / Canvas Deploy Status (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 48s
Plugin uninstall had two sites where execAsRoot errors were discarded:
- Skill directory removal (plugins_install.go:125) — orphaned skill dirs
  if rm -rf failed silently
- CLAUDE.md marker stripping (plugins_install_pipeline.go:326) — stale
  plugin content left in CLAUDE.md if awk script failed

Both now log the error without failing the overall uninstall (best-effort
 cleanup), giving operators visibility into incomplete uninstalls.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-05 03:52:07 +00:00
hongming 71010e618a Merge pull request 'test(e2e): live staging e2e — reconciler heals a terminated EC2 (core#2261)' (#2270) from feat/core2261-reconciler-live-e2e into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 3s
CI / Python Lint & Test (push) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 2s
CI / Detect changes (push) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 4s
E2E API Smoke Test / detect-changes (push) Successful in 8s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 5s
E2E Chat / detect-changes (push) Successful in 8s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 4s
Handlers Postgres Integration / detect-changes (push) Successful in 9s
CI / Platform (Go) (push) Successful in 2s
CI / Canvas (Next.js) (push) Successful in 1s
E2E Chat / E2E Chat (push) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
CI / Canvas Deploy Status (push) Successful in 1s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 15s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 2s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (push) Successful in 16s
CI / Shellcheck (E2E scripts) (push) Successful in 13s
CI / all-required (push) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 53s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Successful in 1m19s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Successful in 1m25s
E2E Staging Reconciler (heals terminated EC2) / pr-validate (push) Waiting to run
E2E Staging Reconciler (heals terminated EC2) / E2E Staging Reconciler (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Successful in 3m45s
publish-workspace-server-image / Production auto-deploy (push) Successful in 2m45s
2026-06-05 01:11:52 +00:00
hongming 53ec08cbdb test(e2e): live staging e2e — reconciler heals a terminated EC2 (core#2261)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 11s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
security-review / approved (pull_request_target) Failing after 7s
qa-review / approved (pull_request_target) Failing after 10s
CI / Platform (Go) (pull_request) Successful in 1s
gate-check-v3 / gate-check (pull_request_target) Successful in 11s
CI / Canvas (Next.js) (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 30s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 56s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m0s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m16s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m11s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 1m15s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m18s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-tier-check / tier-check (pull_request_target) Has been cancelled
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 4s
audit-force-merge / audit (pull_request_target) Successful in 5s
E2E Staging Reconciler (heals terminated EC2) / pr-validate (pull_request) Waiting to run
E2E Staging Reconciler (heals terminated EC2) / E2E Staging Reconciler (pull_request) Waiting to run
Provisions a real staging workspace, terminates its EC2 out-of-band, and
asserts the core#2261 instance-state reconciler heals it against real infra.

PRIMARY assertion (gate): within ~180s the workspace status leaves 'online'
— the reconciler detected the dead instance via CPProvisioner.IsRunning and
flipped it. A terminated EC2 masquerading as 'online' is exactly the
core#2247 regression this guards.

SECONDARY assertion (best-effort, ~600s): the onOffline -> RestartByID
existing-volume heal brings it back to 'online' on a NEW instance_id. Logged
but non-fatal — PRIMARY is the gate; a future tightening to a hard fail is
one edit away (noted in the script).

Kill primitive: aws ec2 terminate-instances on the captured instance_id
(falls back to slug-tag describe). Teardown is guaranteed by an up-front
EXIT/INT/TERM trap that deletes the tenant + leak-sweeps slug-tagged EC2
(reuses lib/aws_leak_check.sh), so a mid-test failure never orphans a box.

Real-infra complement to the deterministic unit tests
(cp_instance_reconciler.go). New workflow e2e-staging-reconciler.yml fires on
reconciler/script/lib changes + a daily schedule. NON-required initially
(continue-on-error: true) — promote to branch-required once green on main for
a de-flake window.

Refs core#2261, core#2247.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 18:09:17 -07:00
hongming d34d09db01 Merge pull request 'test(display): integration test for the take-control WS-proxy + signed-token path (core#2261)' (#2269) from feat/core2261-takecontrol-wsproxy-test into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 5s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 1s
Harness Replays / detect-changes (push) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 4s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 4s
E2E Chat / detect-changes (push) Successful in 10s
Harness Replays / Harness Replays (push) Successful in 1s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 10s
Handlers Postgres Integration / detect-changes (push) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 2s
CI / Detect changes (push) Successful in 15s
E2E API Smoke Test / detect-changes (push) Successful in 15s
CI / Shellcheck (E2E scripts) (push) Successful in 4s
CI / Canvas (Next.js) (push) Successful in 5s
CI / Canvas Deploy Status (push) Successful in 28s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 50s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m5s
E2E Chat / E2E Chat (push) Successful in 4m53s
publish-workspace-server-image / build-and-push (push) Successful in 6m48s
CI / Platform (Go) (push) Successful in 6m54s
CI / all-required (push) Successful in 3s
publish-workspace-server-image / Production auto-deploy (push) Successful in 2m47s
2026-06-05 00:47:13 +00:00
hongming d7484f7df4 test(display): integration test for the take-control WS-proxy + signed-token path (core#2261)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Harness Replays / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
security-review / approved (pull_request_target) Failing after 5s
qa-review / approved (pull_request_target) Failing after 5s
gate-check-v3 / gate-check (pull_request_target) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 9s
CI / Detect changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Harness Replays / Harness Replays (pull_request) Successful in 1s
E2E Chat / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
CI / Canvas Deploy Status (pull_request) Has been skipped
E2E Chat / E2E Chat (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 57s
sop-tier-check / tier-check (pull_request_target) Has been cancelled
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m12s
security-review / approved (pull_request_review) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 4s
sop-tier-check / tier-check (pull_request_review) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 4m1s
CI / all-required (pull_request) Successful in 8s
audit-force-merge / audit (pull_request_target) Successful in 7s
Server-side integration test for the workspace-server DisplaySession
WS-proxy + signed-token handshake, covering the WS-1006 regression
surface (proxy upgrade + token validation + bidirectional bytes) from
core#2247 — without any EC2/desktop/noVNC.

Positive: valid signed token + active lock + enabled display upgrades
(HTTP 101), the fake websockify backend's RFB greeting arrives through
the proxy, and a client->server byte echoes back end-to-end.

Negative (table-driven): missing token (403), tampered token (403),
expired lock (403), display mode none (404), empty instance_id (503),
wrong proxyPath (404) — each asserts no upgrade and no leak to upstream.

displayForward is overridden to a fake httptest websockify backend and
DB reads are sqlmock-ed, mirroring the sibling display-control test
harness. Complements the canvas reconnect unit tests (DisplayTab).

Refs core#2261, core#2247.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 17:40:32 -07:00
claude-ceo-assistant 1d88a6ed0e Merge pull request 'fix(canvas): platform-managed provider needs no user credential (#2245)' (#2246) from fix/2245-platform-managed-no-cred into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 8s
CI / Python Lint & Test (push) Successful in 6s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 2s
Handlers Postgres Integration / detect-changes (push) Successful in 8s
Harness Replays / detect-changes (push) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 8s
Harness Replays / Harness Replays (push) Successful in 2s
CI / Detect changes (push) Successful in 19s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
CI / Platform (Go) (push) Successful in 2s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 12s
CI / Shellcheck (E2E scripts) (push) Successful in 1s
E2E API Smoke Test / detect-changes (push) Successful in 26s
E2E Chat / detect-changes (push) Successful in 25s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 25s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m9s
publish-canvas-image / Build & push canvas image (push) Successful in 1m51s
publish-workspace-server-image / build-and-push (push) Successful in 3m38s
E2E Chat / E2E Chat (push) Successful in 4m30s
CI / Canvas (Next.js) (push) Successful in 6m28s
CI / all-required (push) Successful in 4s
CI / Canvas Deploy Status (push) Successful in 6s
publish-canvas-image / Promote canvas :latest to CI-green build (push) Successful in 5m9s
publish-workspace-server-image / Production auto-deploy (push) Successful in 5m36s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 11m21s
2026-06-05 00:33:17 +00:00
claude-ceo-assistant 1818d03014 Merge pull request 'fix(activity): deterministic since_id feed ordering — monotonic seq tiebreaker (#2339)' (#2258) from fix/activity-feed-stable-ordering into main
CI / Python Lint & Test (push) Successful in 4s
CI / Detect changes (push) Successful in 8s
Block internal-flavored paths / Block forbidden paths (push) Successful in 10s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 1s
Harness Replays / detect-changes (push) Successful in 4s
E2E API Smoke Test / detect-changes (push) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 4s
E2E Chat / detect-changes (push) Successful in 7s
CI / Shellcheck (E2E scripts) (push) Successful in 1s
CI / Canvas (Next.js) (push) Successful in 2s
Harness Replays / Harness Replays (push) Successful in 1s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 5s
Handlers Postgres Integration / detect-changes (push) Successful in 12s
CI / Canvas Deploy Status (push) Successful in 1s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 20s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 54s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m6s
E2E Chat / E2E Chat (push) Successful in 2m20s
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Platform (Go) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / all-required (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging External Runtime / E2E Staging External Runtime (push) Successful in 5m13s
publish-workspace-server-image / build-and-push (push) Successful in 5m59s
publish-workspace-server-image / Production auto-deploy (push) Successful in 18s
2026-06-05 00:32:04 +00:00
hongming 8812285932 Merge pull request 'feat(registry): reconcile online workspaces against real EC2 state — auto-heal terminated instances (core#2261)' (#2266) from feat/core2261-instance-state-reconciler into main
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 2s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 4s
CI / Detect changes (push) Successful in 9s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 4s
E2E Chat / detect-changes (push) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 3s
CI / Canvas (Next.js) (push) Successful in 2s
CI / Shellcheck (E2E scripts) (push) Successful in 1s
CI / Canvas Deploy Status (push) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 1s
Block internal-flavored paths / Block forbidden paths (push) Successful in 16s
CI / Python Lint & Test (push) Successful in 15s
E2E API Smoke Test / detect-changes (push) Successful in 15s
Harness Replays / detect-changes (push) Successful in 16s
Handlers Postgres Integration / detect-changes (push) Successful in 18s
Harness Replays / Harness Replays (push) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 2m12s
E2E Chat / E2E Chat (push) Successful in 2m17s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 2m48s
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Platform (Go) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / all-required (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
publish-workspace-server-image / build-and-push (push) Successful in 8m2s
publish-workspace-server-image / Production auto-deploy (push) Successful in 25s
2026-06-05 00:28:31 +00:00
hongming 48aebdfcc4 feat(registry): reconcile online workspaces against real EC2 state — auto-heal terminated instances (core#2261)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
Harness Replays / detect-changes (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
qa-review / approved (pull_request_target) Failing after 4s
security-review / approved (pull_request_target) Failing after 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
CI / Canvas (Next.js) (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request_target) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 4s
CI / Canvas Deploy Status (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-tier-check / tier-check (pull_request_target) Has been cancelled
qa-review / approved (pull_request_review) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request_review) Has been skipped
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
sop-tier-check / tier-check (pull_request_review) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m41s
CI / Platform (Go) (pull_request) Successful in 3m54s
CI / all-required (pull_request) Successful in 5s
audit-force-merge / audit (pull_request_target) Successful in 15s
Root cause (core#2247): every existing liveness sweep keys off a PROXY
(Redis TTL, agent heartbeat, local Docker, or runtime='external'). A SaaS
claude-code workspace whose EC2 was terminated/stopped falls through ALL
of them and stays status=online pointing at a dead instance_id forever.

Adds StartCPInstanceReconciler: a 60s sweep that asks the ONE
authoritative question the others lack — CPProvisioner.IsRunning (CP
DescribeInstances-equivalent) — for each online SaaS row, and on a clean
"not running" feeds it into the existing onWorkspaceOffline closure
(status flip + RestartByID reprovision, existing volume).

Guardrails: fail-safe (IsRunning is (true, err) on any transient error →
never flip); online + SaaS-EC2 only (runtime <> 'external'); per-cycle
LIMIT 200 + per-workspace timeout.

Refs core#2261, core#2247.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 17:21:16 -07:00
hongming 2304d84b46 Merge pull request 'test(providers): unbreak main — namespaced vendor id is BYOK-routable (cp#529)' (#2265) from fix/cp529-enforcer-test-unbreak-main into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 2s
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 4s
Handlers Postgres Integration / detect-changes (push) Successful in 5s
E2E API Smoke Test / detect-changes (push) Successful in 10s
E2E Chat / detect-changes (push) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 11s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 2s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 15s
Harness Replays / detect-changes (push) Successful in 15s
CI / Detect changes (push) Successful in 20s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
Harness Replays / Harness Replays (push) Successful in 6s
CI / Canvas (Next.js) (push) Successful in 6s
CI / Canvas Deploy Status (push) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 54s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m12s
E2E Chat / E2E Chat (push) Successful in 2m32s
publish-workspace-server-image / build-and-push (push) Successful in 3m35s
CI / Platform (Go) (push) Successful in 4m9s
CI / all-required (push) Successful in 14s
publish-workspace-server-image / Production auto-deploy (push) Successful in 3m47s
2026-06-05 00:00:07 +00:00
hongming 484a257067 test(providers): unbreak main — namespaced vendor id is now BYOK-routable (cp#529)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
CI / Python Lint & Test (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 9s
sop-checklist / review-refire (pull_request_target) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
qa-review / approved (pull_request_target) Failing after 5s
CI / Detect changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
gate-check-v3 / gate-check (pull_request_target) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
Harness Replays / Harness Replays (pull_request) Successful in 1s
sop-tier-check / tier-check (pull_request_target) Successful in 5s
security-review / approved (pull_request_target) Failing after 12s
CI / Canvas (Next.js) (pull_request) Successful in 7s
E2E Chat / E2E Chat (pull_request) Successful in 8s
CI / Canvas Deploy Status (pull_request) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 51s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m29s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m39s
CI / Platform (Go) (pull_request) Successful in 6m19s
CI / all-required (pull_request) Successful in 4s
audit-force-merge / audit (pull_request_target) Successful in 5s
core#2262 merged via a race on the pre-fix commit, so main carries the stale
`platform_shared_openai_namespaced_still_rejected` assertion while the
byok-vendor providers (also in that merge) make hermes openai/gpt-4o routable
via the tenant's key. Flip the assertion to allowed. Unbreaks CI/Platform(Go).

cp#529

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:49:06 -07:00
core-devops 197409f10d fix(activity): precise seq backfill comment + backfill regression test (#2339)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 15s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 3s
E2E Chat / detect-changes (pull_request) Successful in 20s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 20s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 15s
Harness Replays / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 38s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
gate-check-v3 / gate-check (pull_request_target) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 2s
Check migration collisions / Migration version collision check (pull_request) Successful in 49s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
sop-checklist / review-refire (pull_request_target) Has been skipped
security-review / approved (pull_request_target) Failing after 4s
sop-tier-check / tier-check (pull_request_target) Successful in 5s
E2E Chat / E2E Chat (pull_request) Successful in 2s
qa-review / approved (pull_request_target) Failing after 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 15s
CI / Canvas Deploy Status (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Successful in 9s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 59s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m17s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 4m2s
CI / all-required (pull_request) Successful in 1s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m32s
audit-force-merge / audit (pull_request_target) Successful in 11s
Addresses the two valid sub-points in CR2's review of #2258, while the core
claim (existing rows left NULL) is empirically disproven.

EMPIRICAL GROUND TRUTH (PostgreSQL 16.13 prod, re-confirmed on 16.14):
adding `seq BIGINT GENERATED BY DEFAULT AS IDENTITY` to a populated
activity_logs REWRITES the table and assigns seq to EXISTING rows during the
ALTER in physical table-scan order (x=1..5 -> seq=1..5, all NON-NULL); the
identity sequence then advances ABOVE max(seq) so the next INSERT gets seq=6
with no collision. The migration is correct; rows do NOT stay NULL.

1) Comment precision: the up.sql overclaimed seq as a "gap-free monotonically
   increasing value in INSERT (commit) order". Replaced with an accurate
   statement — seq is a UNIQUE, monotonic-once-assigned tiebreaker that is NOT
   gap-free (rollbacks burn values) and NOT a strict commit-order guarantee
   under concurrency; neither property is needed, because any total, stable
   tiebreaker makes (created_at, seq) a deterministic order. Documents the
   table-rewrite backfill + sequence-advances-past-max behavior explicitly.

2) Backfill regression test (the coverage CR2 correctly said was missing):
   new activity_seq_backfill_integration_test.go against real Postgres pins
   the invariant the migration guarantees —
     - _SeqBackfill_NoNull: after migrations, NO activity_logs row has NULL
       seq (per-workspace and table-wide), and the IDENTITY default yields
       distinct, strictly-increasing, non-null seq for fresh inserts.
     - _SeqBackfill_SinceIDOnBackfilledRow: a row whose seq came purely from
       the IDENTITY default (the same mechanism that backfills pre-existing
       rows) is usable as a since_id cursor — its seq is non-null and a second
       row sharing its exact created_at microsecond is returned, not dropped.
   Proven to FAIL if seq were nullable/un-backfilled (ran against a mutant
   schema with a plain nullable seq column: both tests trip) and PASS as-is.

go build ./... + go vet -tags=integration ./internal/handlers/ clean;
integration suite green (SinceID|Seq|Backfill|Ordering) on PG16.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:39:19 -07:00
hongming 6a44d8b175 Merge pull request 'feat(providers): dedicated BYOK-vendor providers make hermes/openclaw vendor menus routable (cp#529)' (#2262) from feat/cp529-byok-vendor-providers into main
ci-arm64-advisory / fast-checks (push) Waiting to run
CI / Python Lint & Test (push) Successful in 4s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 2s
Block internal-flavored paths / Block forbidden paths (push) Successful in 21s
E2E API Smoke Test / detect-changes (push) Successful in 18s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 18s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
sync-providers-yaml / Compare synced providers.yaml against controlplane canonical (push) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Successful in 28s
CI / Detect changes (push) Successful in 44s
E2E Chat / detect-changes (push) Successful in 42s
Handlers Postgres Integration / detect-changes (push) Successful in 27s
Harness Replays / detect-changes (push) Successful in 25s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 7s
verify-providers-gen / Regenerate providers artifact and fail on drift (push) Successful in 25s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 52s
Harness Replays / Harness Replays (push) Successful in 50s
CI / Canvas Deploy Status (push) Successful in 49s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 2m17s
E2E Chat / E2E Chat (push) Successful in 2m17s
publish-workspace-server-image / build-and-push (push) Successful in 3m11s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3m18s
CI / Platform (Go) (push) Failing after 5m47s
CI / all-required (push) Has been skipped
publish-workspace-server-image / Production auto-deploy (push) Failing after 3m10s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (push) Failing after 7m22s
2026-06-04 23:38:32 +00:00
hongming 79162509d0 feat(providers): dedicated BYOK-vendor providers make hermes/openclaw vendor menus routable (cp#529)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 27s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2s
Harness Replays / detect-changes (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 1m9s
gate-check-v3 / gate-check (pull_request_target) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 21s
security-review / approved (pull_request_target) Failing after 8s
qa-review / approved (pull_request_target) Failing after 8s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 25s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
CI / Canvas Deploy Status (pull_request) Has been skipped
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 2m16s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 57s
sop-tier-check / tier-check (pull_request_target) Has been cancelled
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sync-providers-yaml / Compare synced providers.yaml against controlplane canonical (pull_request) Successful in 4s
sop-checklist / all-items-acked (pull_request_target) Successful in 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 6s
CI / Platform (Go) (pull_request) Failing after 3m23s
CI / all-required (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m54s
audit-force-merge / audit (pull_request_target) Successful in 6s
Byte-synced mirror of the canonical change in molecule-controlplane
internal/providers/providers.yaml: add 5 NON-PLATFORM BYOK-vendor
provider entries (byok-anthropic, byok-openai, byok-gemini,
byok-minimax, groq) and wire them as name-only prefix-routing arms
into the hermes / openclaw / codex runtime native sets so the 20
residual ids cp#529 flagged as drift become routable with the
TENANT's OWN vendor key (billing-safe), not the platform-shared key.

- hermes: + byok-anthropic, byok-gemini, byok-openai, byok-minimax (12 ids)
- openclaw: + byok-openai, byok-minimax, groq (7 ids; runtime DEFAULT
  minimax:MiniMax-M2.7 now resolves)
- codex: + byok-minimax (codex-minimax-m2.7 via narrow ^codex-minimax- leg)

Billing-safe: every new provider IsPlatform()==false -> BYOK billing.
Collision-free: all matchers namespaced, disjoint from the platform
vendors' bare matchers; DeriveProvider resolves all 20 ids +
codex-minimax-m2.7 to exactly one non-platform provider.

This is the molecule-core SIDE of the synced registry: providers.yaml
is byte-identical to controlplane's (diff -u empty), registry_gen.go
regenerated, and canonicalProvidersYAMLSHA256 bumped to the new
canonical sha. The two PRs must land together.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:31:03 -07:00
hongming 9b19759ceb Merge pull request 'feat(providers): BYOK-routability-aware workspace-create enforcer (cp#529)' (#2256) from feat/cp529-byok-routability-enforcer into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 2s
CI / Python Lint & Test (push) Successful in 7s
Harness Replays / detect-changes (push) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 4s
E2E Chat / detect-changes (push) Successful in 10s
E2E API Smoke Test / detect-changes (push) Successful in 10s
Harness Replays / Harness Replays (push) Successful in 2s
sync-providers-yaml / Compare synced providers.yaml against controlplane canonical (push) Successful in 5s
Handlers Postgres Integration / detect-changes (push) Successful in 12s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 11s
CI / Detect changes (push) Successful in 15s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 14s
CI / Shellcheck (E2E scripts) (push) Successful in 2s
CI / Canvas (Next.js) (push) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 12s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Successful in 28s
verify-providers-gen / Regenerate providers artifact and fail on drift (push) Successful in 22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 23s
CI / Canvas Deploy Status (push) Successful in 23s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 53s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 2m22s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3m22s
E2E Chat / E2E Chat (push) Successful in 5m31s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (push) Failing after 6m7s
CI / Platform (Go) (push) Successful in 7m17s
CI / all-required (push) Successful in 14s
publish-workspace-server-image / build-and-push (push) Successful in 8m19s
publish-workspace-server-image / Production auto-deploy (push) Successful in 2m34s
2026-06-04 23:07:42 +00:00
core-devops acdb368a4f fix(canvas): re-apply #2245 platform-managed source (HEAD reverted it via bad rebase)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 8s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Harness Replays / detect-changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request_target) Successful in 4s
qa-review / approved (pull_request_target) Failing after 3s
security-review / approved (pull_request_target) Failing after 4s
Harness Replays / Harness Replays (pull_request) Successful in 1s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 23s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
CI / Detect changes (pull_request) Successful in 30s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 20s
E2E Chat / detect-changes (pull_request) Successful in 29s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 25s
CI / Platform (Go) (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
CI / Canvas (Next.js) (pull_request) Successful in 6m46s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 25s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 7s
audit-force-merge / audit (pull_request_target) Successful in 6s
HEAD 911d9ce3 was labeled test-only but its rebase took the pre-fix source
blobs, deleting the isPlatformManagedProvider helper + its 3 call-sites that
21268f0f had correctly added — so the new #2245 tests ran against un-fixed
source (6 reds: 'isPlatformManagedProvider is not a function' x4 + missing
'Platform-managed — no API key required.' copy x2). Mechanism = clobbered
source, NOT a flake. Restores both files to 21268f0f. SSOT: helper defined
once in ProviderModelSelector, imported in the dialog. Canvas suite 3342 pass / 0 fail.
2026-06-04 15:55:46 -07:00
core-devops 5f6b9b242e fix(activity): carry seq through session-search CTE — 500 on Handlers PG Integration
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
Check migration collisions / Migration version collision check (pull_request) Successful in 12s
CI / Python Lint & Test (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request_target) Successful in 5s
qa-review / approved (pull_request_target) Failing after 4s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request_target) Failing after 4s
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-tier-check / tier-check (pull_request_target) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m2s
CI / Canvas Deploy Status (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 58s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m19s
CI / Platform (Go) (pull_request) Successful in 4m15s
CI / all-required (pull_request) Successful in 2s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m19s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 5s
The prior commit added ORDER BY created_at DESC, seq DESC to
buildSessionSearchQuery, but the outer SELECT reads from the
session_items CTE whose projection did NOT include seq. An outer ORDER BY
can only reference the CTE's output columns, so real Postgres raised
`column "seq" does not exist` -> SessionSearch 500 ->
TestIntegration_SessionSearch_Basic/_EmptyQuery failed the Handlers
Postgres Integration job. sqlmock missed it (regex-matches the query
string, never executes it).

Fix: project seq through session_items so the outer ORDER BY can see it.
Integration suite green (incl. the two SinceID ordering proofs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:34:41 -07:00
core-devops 8517b8e776 fix(activity): tiebreak the (unused) session-search query too — no unstable sorts (§ No flakes)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 8s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
Check migration collisions / Migration version collision check (pull_request) Successful in 17s
E2E Chat / detect-changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request_target) Successful in 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Failing after 5s
security-review / approved (pull_request_target) Failing after 4s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
sop-checklist / all-items-acked (pull_request_target) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request_target) Successful in 5s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 54s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 1m11s
CI / Platform (Go) (pull_request) Successful in 4m0s
CI / all-required (pull_request) Successful in 2s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m29s
buildSessionSearchQuery ORDER BY created_at DESC had the same missing-tiebreaker
non-determinism as the since_id feed. Unused in production, but the seq column
now exists and leaving a known unstable sort violates dev-sop § No flakes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:17:41 -07:00
core-devops d0edc74dc0 fix(activity): deterministic since_id feed ordering — monotonic seq tiebreaker (#2339)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
sop-checklist / review-refire (pull_request_target) Has been skipped
CI / Detect changes (pull_request) Successful in 15s
Harness Replays / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
qa-review / approved (pull_request_target) Failing after 5s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request_target) Failing after 5s
gate-check-v3 / gate-check (pull_request_target) Successful in 7s
sop-checklist / all-items-acked (pull_request_target) Successful in 6s
Check migration collisions / Migration version collision check (pull_request) Successful in 21s
E2E Chat / E2E Chat (pull_request) Successful in 10s
sop-tier-check / tier-check (pull_request_target) Successful in 20s
CI / Canvas (Next.js) (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 1s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Successful in 39s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m4s
ci-arm64-advisory / fast-checks (pull_request) Has been cancelled
CI / all-required (pull_request) Has been cancelled
CI / Platform (Go) (pull_request) Has been cancelled
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m32s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m11s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Waiting to run
The poll-mode since_id feed ordered by created_at with NO tiebreaker, and
activity_logs.id is a random UUID (no monotonic column) — same-microsecond
rows came back in arbitrary planner order, intermittently flipping
hello-from-e2e-2|hello-from-e2e-3 in test_poll_mode_e2e.sh. Not a flake: a
missing tiebreaker (per dev-sop § No flakes). Second bug fixed: the since_id
cursor filtered created_at > X strictly, silently dropping a row written in
the cursor row's microsecond.

Fix: add monotonic seq BIGINT GENERATED BY DEFAULT AS IDENTITY (idempotent) +
(workspace_id, created_at, seq) index; ORDER BY (created_at, seq); cursor
compares the full (created_at, seq) tuple. Integration test (real PG) proves
red->green incl. the boundary row (fails 5/5 pre-fix). Unit sqlmock updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:16:10 -07:00
hongming ddc4e8190c feat(providers): BYOK-routability-aware workspace-create enforcer (cp#529)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
CI / Detect changes (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 10s
qa-review / approved (pull_request_target) Failing after 6s
gate-check-v3 / gate-check (pull_request_target) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 5s
security-review / approved (pull_request_target) Failing after 9s
Harness Replays / Harness Replays (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 8s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 27s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 55s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
CI / Canvas Deploy Status (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 34s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3m28s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 2m23s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m44s
CI / Platform (Go) (pull_request) Successful in 4m12s
CI / all-required (pull_request) Successful in 2s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m10s
sop-checklist / review-refire (pull_request_target) Has been skipped
sync-providers-yaml / Compare synced providers.yaml against controlplane canonical (pull_request) Successful in 12s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 59s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 4s
audit-force-merge / audit (pull_request_target) Successful in 4s
validateRegisteredModelForRuntime now allows a model if it is on the
runtime's platform menu (ModelsForRuntime) OR DeriveProvider resolves a
native provider — the CTO-approved Option C routability path. Wire
confirmed-non-platform BYOK providers into claude-code/hermes/openclaw as
name-only native arms (zero platform-menu change) + widen their prefix
matchers to accept both slash and colon BYOK id forms.

Billing guardrail: only non-platform (BYOK) providers are wired; the
platform-shared vendors (openai/gemini/minimax/anthropic, and groq which
has no provider) are deliberately NOT wired, so their ids stay residual
drift rather than billing a customer's model through the platform key.

claude-code now fully resolves; residual drift = only platform-shared ids
(hermes anthropic//gemini//openai//minimax/, codex codex-minimax, openclaw
groq:/openai:/minimax:) — trimmed from templates / restored via dedicated
BYOK-vendor providers in a follow-up. Build + providers/gen/handlers tests
green.

NOTE: overlaps files with open PR #2241 (cp#521, trim approach); co-review
and rebase before merge.

cp#529

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:36:39 -07:00
core-devops 911d9ce3c8 test(canvas): cover registry-backed platform billingMode suppression (#2245)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request_target) Successful in 8s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 10s
Harness Replays / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 13s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Failing after 6s
security-review / approved (pull_request_target) Failing after 5s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
sop-tier-check / tier-check (pull_request_target) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 20s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m2s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Failing after 5m48s
CI / all-required (pull_request) Has been skipped
CI / Canvas Deploy Status (pull_request) Has been skipped
Independent review noted the integration test exercised only the legacy
vendor==="platform" branch; production uses the registry-backed
billingMode==="platform_managed" path. Add a registry fixture whose
platform provider declares auth_env:[MOLECULE_LLM_USAGE_TOKEN] and assert
end-to-end through buildProviderCatalogFromRegistry: field hidden, no
error, no secret in the create payload. Watch-it-fail verified red->green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 11:52:22 -07:00
core-devops 21268f0fe4 fix(canvas): platform-managed provider needs no user credential (#2245)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 9s
qa-review / approved (pull_request_target) Failing after 6s
security-review / approved (pull_request_target) Failing after 5s
Harness Replays / Harness Replays (pull_request) Successful in 1s
CI / Detect changes (pull_request) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 2s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
gate-check-v3 / gate-check (pull_request_target) Successful in 17s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m12s
CI / Canvas (Next.js) (pull_request) Successful in 6m14s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 1s
The Create-workspace dialog blocked submission with "Provider credential
is required" for the platform-managed provider, even though platform-
managed mode injects its own usage token (MOLECULE_LLM_USAGE_TOKEN = the
tenant admin_token, set by the CP provisioner) and the user supplies no
key. The validation keyed only off envVars.length, with no exemption for
platform-managed; it also rendered a credential field for the internal
token and would have sent secrets:{MOLECULE_LLM_USAGE_TOKEN:""} on create,
clobbering the provisioner-injected token.

Add isPlatformManagedProvider() (vendor==="platform" ||
billingMode==="platform_managed") and gate the validation, the
credential-field render, and the secret-send on it. Platform-managed now
shows "no API key required" and sends no secret; BYOK is unchanged.

Tests: discriminating vitest (watch-it-fail verified red->green) — a
platform-managed provider WITH a declared auth env requires no credential,
hides the field, and sends no secret; BYOK still requires + renders the
field; + isPlatformManagedProvider unit cases. The prior mock masked the
bug by giving the platform provider required_env:[] — the new fixture
matches production (auth_env carries MOLECULE_LLM_USAGE_TOKEN).

Fixes #2245

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 11:40:06 -07:00
26 changed files with 2637 additions and 293 deletions
+196
View File
@@ -0,0 +1,196 @@
name: E2E Staging Reconciler (heals terminated EC2)
# Live staging proof for the core#2261 instance-state reconciler
# (workspace-server/internal/registry/cp_instance_reconciler.go). The
# real-infra complement to the deterministic unit tests: provisions a real
# staging workspace, TERMINATES its EC2, and asserts the reconciler flips it
# off 'online' (PRIMARY gate) and auto-reprovisions on a new instance_id
# (SECONDARY, best-effort). See
# tests/e2e/test_reconciler_heals_terminated_instance.sh for the assertion
# contract + timeouts.
#
# Modeled on e2e-staging-saas.yml. Same secrets + same Gitea-port caveats:
# - Dropped workflow_dispatch.inputs (Gitea 1.22.6 parser rejects them).
# - Dropped merge_group / environment (no Gitea equivalent).
# - Workflow-level env.GITHUB_SERVER_URL pinned per
# feedback_act_runner_github_server_url.
#
# NOT a required check (yet). This is a brand-new live E2E that provisions +
# terminates real EC2 (costs money, shares the cp#245 cold-boot flake
# surface). A new live e2e must NOT hard-gate every merge until it has a
# green track record. continue-on-error: true surfaces failures without
# blocking. PROMOTE to branch-required (flip continue-on-error → false AND
# add "E2E Staging Reconciler" to branch protection) once it has run green on
# main for several consecutive days — same de-flake discipline the
# platform-boot job in e2e-staging-saas.yml documents.
on:
# Run when the reconciler itself, the script, or the libs it depends on
# change — so a reconciler regression is caught on the PR that introduces
# it (paths filter), plus a daily schedule to catch infra/AMI drift.
push:
branches: [main]
paths:
- 'workspace-server/internal/registry/cp_instance_reconciler.go'
- 'tests/e2e/test_reconciler_heals_terminated_instance.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/lib/model_slug.sh'
- '.gitea/workflows/e2e-staging-reconciler.yml'
pull_request:
branches: [main]
paths:
- 'workspace-server/internal/registry/cp_instance_reconciler.go'
- 'tests/e2e/test_reconciler_heals_terminated_instance.sh'
- 'tests/e2e/lib/aws_leak_check.sh'
- 'tests/e2e/lib/model_slug.sh'
- '.gitea/workflows/e2e-staging-reconciler.yml'
workflow_dispatch:
schedule:
# 08:00 UTC daily — offset from e2e-staging-saas (07:00) so the two live
# harnesses don't fight over staging's per-hour org-creation quota.
- cron: '0 8 * * *'
# Serialize against itself: staging has a finite per-hour org-creation quota,
# and a cancelled run mid-teardown leaks EC2. cancel-in-progress: false
# mirrors e2e-staging-saas.yml.
concurrency:
group: e2e-staging-reconciler
cancel-in-progress: false
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# PR-validation path: always posts success so a workflow-only / script-only
# PR has a status check (this workflow's real job only fires on the paths
# filter). Mirrors the pr-validate job in e2e-staging-saas.yml.
pr-validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
continue-on-error: true
- name: YAML validation (best-effort)
run: |
echo "e2e-staging-reconciler.yml — PR validation: workflow YAML is valid."
echo "Live E2E step runs only when the reconciler / script / libs change."
continue-on-error: true
e2e-staging-reconciler:
name: E2E Staging Reconciler
runs-on: ubuntu-latest
# NOT required yet — surface failures without blocking merges. Flip to
# false + add to branch protection once green on main for a de-flake
# window (see the header note). mc#1982: do not renew this mask silently.
continue-on-error: true
timeout-minutes: 60
permissions:
contents: read
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
# Single admin-bearer secret drives provision + tenant-token retrieval +
# teardown (= Railway staging CP_ADMIN_API_TOKEN). Same secret name the
# saas workflow canonicalised to under internal#322.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-2
# Leak-check is REQUIRED here: this test deliberately terminates an EC2,
# so teardown MUST positively confirm no slug-tagged box survives.
E2E_AWS_LEAK_CHECK: required
E2E_AWS_TERMINATE_LEAKS: '1'
# claude-code + MiniMax is the cheapest boot-to-online path (same as the
# saas job). The reconciler test never makes a completion, but the key is
# wired so the first boot reaches online on the same path the saas
# harness uses. First non-empty wins in the script's priority chain.
E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }}
E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }}
E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }}
E2E_RUNTIME: claude-code
E2E_MODEL_SLUG: MiniMax-M2
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Verify required secrets present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
if [ -z "${!var:-}" ]; then
echo "::error::$var not set — this test terminates an EC2 and verifies no leak; AWS creds are mandatory"
exit 2
fi
done
echo "Required secrets present ✓"
- name: CP staging health preflight
run: |
code=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$MOLECULE_CP_URL/health")
if [ "$code" != "200" ]; then
echo "::error::Staging CP unhealthy (got HTTP $code). Skipping — not a reconciler bug."
exit 1
fi
echo "Staging CP healthy ✓"
- name: Run reconciler heal E2E
id: e2e
run: bash tests/e2e/test_reconciler_heals_terminated_instance.sh
# Belt-and-braces teardown: the script installs its own EXIT trap, but if
# the runner is cancelled the trap may not fire. This always() step
# double-deletes any e2e-rec-* org from THIS run. The admin DELETE is
# idempotent so double-invoking is safe.
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "
import json, sys, os, datetime
run_id = os.environ.get('GITHUB_RUN_ID', '')
d = json.load(sys.stdin)
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
# Slug shape: e2e-rec-YYYYMMDD-<run_id>-<attempt>-...
if run_id:
prefixes = tuple(f'e2e-rec-{d}-{run_id}-' for d in dates)
else:
prefixes = tuple(f'e2e-rec-{d}-' for d in dates)
candidates = [o['slug'] for o in d.get('orgs', [])
if any(o.get('slug','').startswith(p) for p in prefixes)
and o.get('instance_status') not in ('purged',)]
print('\n'.join(candidates))
" 2>/dev/null)
leaks=()
for slug in $orgs; do
echo "Safety-net teardown: $slug"
set +e
curl -sS -o /tmp/rec-cleanup.out -w "%{http_code}" \
-X DELETE "$MOLECULE_CP_URL/cp/admin/tenants/$slug" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$slug\"}" >/tmp/rec-cleanup.code
set -e
code=$(cat /tmp/rec-cleanup.code 2>/dev/null || echo "000")
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::reconciler teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/rec-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::reconciler teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
+34 -15
View File
@@ -10,6 +10,7 @@ import {
buildProviderCatalog,
buildProviderCatalogFromRegistry,
findProviderForModel,
isPlatformManagedProvider,
type SelectorModel,
type SelectorValue,
type RegistryProvider,
@@ -290,7 +291,15 @@ export function CreateWorkspaceButton() {
setError("Model is required");
return;
}
if (!isExternal && selectedLLMProvider?.envVars.length && !llmSecret.trim()) {
// Platform-managed providers need NO user credential — the platform injects
// its own usage token (MOLECULE_LLM_USAGE_TOKEN = tenant admin_token) at
// provision time. Only BYOK providers require a user-supplied key. (#2245)
if (
!isExternal &&
!isPlatformManagedProvider(selectedLLMProvider) &&
selectedLLMProvider?.envVars.length &&
!llmSecret.trim()
) {
setError("Provider credential is required");
return;
}
@@ -325,7 +334,11 @@ export function CreateWorkspaceButton() {
? {
model: llmSelection.model.trim(),
llm_provider: nativeProvider.vendor,
...(nativeProvider.envVars.length > 0
// Only BYOK providers carry a user secret. For platform-managed
// the token is provisioner-injected; sending an (empty) secret
// here would clobber it — so omit it entirely. (#2245)
...(nativeProvider.envVars.length > 0 &&
!isPlatformManagedProvider(nativeProvider)
? { secrets: { [nativeProvider.envVars[0]]: llmSecret.trim() } }
: {}),
}
@@ -521,20 +534,26 @@ export function CreateWorkspaceButton() {
idPrefix="create-workspace-llm"
variant="stack"
/>
{selectedLLMProvider.envVars.length > 0 && (
<div>
<label htmlFor="llm-secret-input" className="text-[11px] text-ink-mid block mb-1">
{selectedLLMProvider.envVars[0]}
</label>
<input
id="llm-secret-input"
type="password"
value={llmSecret}
onChange={(e) => setLLMSecret(e.target.value)}
autoComplete="off"
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink placeholder-ink-soft focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors font-mono"
/>
{isPlatformManagedProvider(selectedLLMProvider) ? (
<div className="text-[11px] text-ink-soft">
Platform-managed no API key required.
</div>
) : (
selectedLLMProvider.envVars.length > 0 && (
<div>
<label htmlFor="llm-secret-input" className="text-[11px] text-ink-mid block mb-1">
{selectedLLMProvider.envVars[0]}
</label>
<input
id="llm-secret-input"
type="password"
value={llmSecret}
onChange={(e) => setLLMSecret(e.target.value)}
autoComplete="off"
className="w-full bg-surface-card/60 border border-line/50 rounded-lg px-3 py-2 text-sm text-ink placeholder-ink-soft focus:outline-none focus:border-accent/60 focus:ring-1 focus:ring-accent/20 transition-colors font-mono"
/>
</div>
)
)}
</div>
)}
@@ -55,6 +55,21 @@ export interface ProviderEntry {
billingMode?: "platform_managed" | "byok";
}
/** A provider is "platform-managed" when the Molecule platform proxies the LLM
* call and injects its own usage credential — the tenant admin_token, surfaced
* to the workspace as MOLECULE_LLM_USAGE_TOKEN by the CP provisioner
* (controlplane ec2.go: `MOLECULE_LLM_USAGE_TOKEN="$ADMIN_TOKEN"`). The user
* supplies NO key for these: the credential is internal plumbing, not a user
* input. Detected by vendor==="platform" (the platform proxy provider, which
* declares MOLECULE_LLM_USAGE_TOKEN in its AuthEnv) OR
* billingMode==="platform_managed" (registry-backed, internal#718 P3). BYOK
* providers return false and DO require a user-supplied credential. */
export function isPlatformManagedProvider(
p?: Pick<ProviderEntry, "vendor" | "billingMode"> | null,
): boolean {
return p?.vendor === "platform" || p?.billingMode === "platform_managed";
}
/** RegistryProvider mirrors one entry of GET /templates `registry_providers`
* (workspace-server registryProviderView): the registry's native provider for
* a runtime, with its display label, auth-env NAMES, and billing mode. This is
@@ -2,6 +2,7 @@
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { render, screen, fireEvent, waitFor, cleanup } from "@testing-library/react";
import { CreateWorkspaceButton } from "../CreateWorkspaceDialog";
import { isPlatformManagedProvider } from "../ProviderModelSelector";
vi.mock("@/lib/api", () => ({
api: {
@@ -65,6 +66,34 @@ const SAMPLE_TEMPLATES = [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", required_env: [] },
],
},
// #2245 fixtures. The real registry `platform` provider declares
// MOLECULE_LLM_USAGE_TOKEN in its auth_env — the default mock above masks the
// bug by using required_env:[]. This template gives the platform provider a
// non-empty auth env (matching production) so the credential-suppression
// logic is actually exercised.
{
id: "platform-managed-test",
name: "Platform Managed Test",
runtime: "claude-code",
model: "moonshot/kimi-k2.6",
providers: ["platform", "minimax"],
models: [
{ id: "moonshot/kimi-k2.6", name: "Kimi K2.6", provider: "platform", required_env: ["MOLECULE_LLM_USAGE_TOKEN"] },
{ id: "MiniMax-M2.7", name: "MiniMax M2.7", required_env: ["MINIMAX_API_KEY"] },
],
},
// BYOK-only template (no platform provider) — the credential requirement
// MUST still hold for these (no-regression guard).
{
id: "byok-only-test",
name: "BYOK Only Test",
runtime: "claude-code",
model: "openai/gpt-4o",
providers: ["openai"],
models: [
{ id: "openai/gpt-4o", name: "GPT-4o", required_env: ["OPENAI_API_KEY"] },
],
},
];
beforeEach(() => {
@@ -498,6 +527,25 @@ const REGISTRY_TEMPLATE = {
],
};
// Registry-backed platform provider WITH a non-empty auth_env — this matches
// the PRODUCTION provider view, which ships the raw AuthEnv
// ([MOLECULE_LLM_USAGE_TOKEN]). REGISTRY_TEMPLATE above uses auth_env:[] so it
// never exercises suppression; this one drives the billingMode==="platform_
// managed" branch end-to-end through buildProviderCatalogFromRegistry. (#2245)
const REGISTRY_TEMPLATE_PLATFORM_AUTHENV = {
...REGISTRY_TEMPLATE,
registry_providers: [
{
name: "platform",
display_name: "Platform",
auth_env: ["MOLECULE_LLM_USAGE_TOKEN"],
billing_mode: "platform_managed",
},
{ name: "minimax", display_name: "MiniMax", auth_env: ["MINIMAX_API_KEY"], billing_mode: "byok" },
{ name: "anthropic", display_name: "Anthropic API", auth_env: ["ANTHROPIC_API_KEY"], billing_mode: "byok" },
],
};
describe("CreateWorkspaceDialog — registry-backed provider catalog (RFC#340 Fix C)", () => {
beforeEach(() => {
mockGet.mockImplementation(async (url: string) => {
@@ -574,6 +622,41 @@ describe("CreateWorkspaceDialog — registry-backed provider catalog (RFC#340 Fi
expect(body.llm_provider).toBe("minimax");
expect(body.secrets).toEqual({ MINIMAX_API_KEY: "sk-minimax-test" });
});
it("suppresses the credential for a registry-backed platform provider that declares an auth_env — billingMode path (#2245)", async () => {
// Override the default REGISTRY_TEMPLATE (auth_env:[]) with the production-
// shaped one whose platform provider declares MOLECULE_LLM_USAGE_TOKEN.
mockGet.mockImplementation(async (url: string) => {
if (url === "/templates") {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return [REGISTRY_TEMPLATE_PLATFORM_AUTHENV] as any;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return SAMPLE_WORKSPACES as any;
});
await openDialog();
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Registry Platform Agent" },
});
// Platform is the default bucket; even with a non-empty auth_env the key
// field must NOT render (suppressed via billingMode==="platform_managed").
await waitFor(() => {
const sel = document.querySelector("[data-testid='provider-select']") as HTMLSelectElement;
expect(sel?.value).toBe("registry|platform");
});
expect(screen.getByText("Platform-managed — no API key required.")).toBeTruthy();
expect(document.getElementById("llm-secret-input")).toBeNull();
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
expect(screen.queryByText("Provider credential is required")).toBeNull();
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.llm_provider).toBe("platform");
// The provisioner-injected MOLECULE_LLM_USAGE_TOKEN must NOT be clobbered.
expect(body.secrets).toBeUndefined();
});
});
// ---------------------------------------------------------------------------
@@ -657,3 +740,70 @@ describe("CreateWorkspaceDialog — budget_limit field", () => {
expect(budgetInput.value).toBe("");
});
});
describe("CreateWorkspaceDialog — platform-managed credential suppression (#2245)", () => {
describe("isPlatformManagedProvider", () => {
it("is true for the platform proxy vendor", () => {
expect(isPlatformManagedProvider({ vendor: "platform" })).toBe(true);
});
it("is true for a registry billingMode of platform_managed", () => {
expect(
isPlatformManagedProvider({ vendor: "minimax", billingMode: "platform_managed" }),
).toBe(true);
});
it("is false for a BYOK provider", () => {
expect(isPlatformManagedProvider({ vendor: "anthropic", billingMode: "byok" })).toBe(false);
expect(isPlatformManagedProvider({ vendor: "minimax" })).toBe(false);
});
it("is false for null/undefined", () => {
expect(isPlatformManagedProvider(null)).toBe(false);
expect(isPlatformManagedProvider(undefined)).toBe(false);
});
});
it("platform-managed provider with a declared auth env requires NO credential, hides the key field, and sends NO secret", async () => {
await openDialog();
await setTemplate("platform-managed-test");
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "Platform Agent" },
});
// The credential input must NOT render for platform-managed; a "no key
// required" note appears instead.
await waitFor(() =>
expect(screen.getByText("Platform-managed — no API key required.")).toBeTruthy(),
);
expect(screen.queryByLabelText("MOLECULE_LLM_USAGE_TOKEN")).toBeNull();
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
await waitFor(() => expect(mockPost).toHaveBeenCalled());
// No validation error, and the provisioner-injected token is NOT clobbered
// by an empty secret.
expect(screen.queryByText("Provider credential is required")).toBeNull();
const body = mockPost.mock.calls[0][1] as Record<string, unknown>;
expect(body.llm_provider).toBe("platform");
expect(body.secrets).toBeUndefined();
});
it("BYOK provider still requires a credential and renders the key field (no-regression)", async () => {
await openDialog();
await setTemplate("byok-only-test");
fireEvent.change(screen.getByPlaceholderText("e.g. SEO Agent"), {
target: { value: "BYOK Agent" },
});
// The credential field IS rendered for BYOK...
await waitFor(() => expect(screen.getByLabelText("OPENAI_API_KEY")).toBeTruthy());
const createBtn = screen.getAllByRole("button").find((b) => b.textContent === "Create");
fireEvent.click(createBtn!);
// ...and create stays blocked until it's filled.
await waitFor(() =>
expect(screen.getByText("Provider credential is required")).toBeTruthy(),
);
expect(mockPost).not.toHaveBeenCalled();
});
});
+493
View File
@@ -0,0 +1,493 @@
#!/usr/bin/env bash
# Live staging E2E — the CP instance-state reconciler heals a terminated EC2.
#
# Real-infra complement to the deterministic unit tests for core#2261
# (workspace-server/internal/registry/cp_instance_reconciler.go). Those unit
# tests pin the reconcile logic against fakes; THIS script proves the loop
# actually runs in a real tenant's workspace-server and drives the EXISTING
# offline + auto-heal machinery against real AWS.
#
# Root regression (core#2247): a SaaS workspace whose EC2 is terminated out
# from under the platform (manual AWS action, spot reclaim, CP reap) fell
# through every existing liveness pass and kept reading status='online'
# forever, pointing at a dead instance. The reconciler closes that gap with
# CPProvisioner.IsRunning and feeds a clean "not running" into onOffline →
# RestartByID (existing-volume reprovision).
#
# What this test does:
# 1. Provision a fresh staging org + ONE workspace (same default
# runtime/model as the full-saas harness, so it actually boots).
# 2. Poll the tenant API until the workspace is status=online; capture its
# instance_id.
# 3. KILL it — terminate that exact EC2 via `aws ec2 terminate-instances`.
# 4. Assert the reconciler heals it:
# PRIMARY (gate) — within ~180s the workspace status LEAVES
# 'online' (the reconciler detected the dead
# instance via IsRunning and flipped it). This
# is the core regression guard: a dead instance
# must NOT keep reading 'online'.
# SECONDARY (best-effort) — within ~10 min it auto-reprovisions:
# status returns to 'online' with a NEW
# instance_id (onOffline → RestartByID
# existing-volume heal). If reprovision doesn't
# finish in the bound we log it clearly but let
# the PRIMARY assertion stand as the gate (see
# the comment at the secondary block — a future
# tightening that promotes this to a hard gate is
# deliberately one edit away).
# 5. Teardown ALWAYS (EXIT trap): delete the tenant + leak-sweep so no EC2
# is orphaned, even on a mid-test failure.
#
# Auth model + provisioning conventions are copied verbatim from
# test_staging_full_saas.sh (single MOLECULE_ADMIN_TOKEN → CP admin; per-
# tenant admin token + X-Molecule-Org-Id header for tenant API). The kill
# primitive + leak sweep reuse lib/aws_leak_check.sh.
#
# Required env:
# MOLECULE_CP_URL default: https://staging-api.moleculesai.app
# MOLECULE_ADMIN_TOKEN CP admin bearer — Railway staging CP_ADMIN_API_TOKEN
#
# Optional env (mirrors the full-saas harness where they overlap):
# E2E_RUNTIME claude-code (default)
# E2E_PROVISION_TIMEOUT_SECS default 900 (cold EC2 budget)
# E2E_WORKSPACE_ONLINE_TIMEOUT_SECS default 3600 (cold-boot worst-case)
# E2E_RECONCILE_OFFLINE_TIMEOUT_SECS default 180 (PRIMARY: leave 'online'.
# Reconciler cadence is 60s — 3 cycles +
# AWS terminate-visibility slack.)
# E2E_REPROVISION_TIMEOUT_SECS default 600 (SECONDARY: back to online
# with a NEW instance_id)
# E2E_MINIMAX_API_KEY / E2E_ANTHROPIC_API_KEY / E2E_OPENAI_API_KEY
# LLM key (same priority chain as
# full-saas; needed so the FIRST boot
# reaches online). Empty → '{}' (the
# workspace still boots online; the LLM
# key only matters for a completion,
# which this test never makes).
# E2E_KEEP_ORG 1 → skip teardown (debugging only)
# E2E_RUN_ID Slug suffix; CI: ${GITHUB_RUN_ID}
# E2E_AWS_LEAK_CHECK auto (default) | required | off
# E2E_AWS_TERMINATE_LEAKS 1 → terminate slug-tagged leaked EC2 at
# teardown
#
# Exit codes:
# 0 happy path (PRIMARY assertion held; SECONDARY logged either way)
# 1 generic failure (incl. PRIMARY assertion failed = regression)
# 2 missing required env
# 3 provisioning timed out
# 4 teardown left orphan resources
set -euo pipefail
CP_URL="${MOLECULE_CP_URL:-https://staging-api.moleculesai.app}"
ADMIN_TOKEN="${MOLECULE_ADMIN_TOKEN:?MOLECULE_ADMIN_TOKEN required — Railway staging CP_ADMIN_API_TOKEN}"
RUNTIME="${E2E_RUNTIME:-claude-code}"
PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-900}"
WORKSPACE_ONLINE_TIMEOUT_SECS="${E2E_WORKSPACE_ONLINE_TIMEOUT_SECS:-3600}"
# PRIMARY bound: the reconciler ticks every 60s; it needs one cycle to see
# the dead instance after AWS makes the terminate visible to DescribeInstances
# (typically seconds, but can lag). 180s = ~3 cycles + slack.
RECONCILE_OFFLINE_TIMEOUT_SECS="${E2E_RECONCILE_OFFLINE_TIMEOUT_SECS:-180}"
# SECONDARY bound: full existing-volume reprovision (new EC2 boot + agent
# bootstrap) is a multi-minute cold path.
REPROVISION_TIMEOUT_SECS="${E2E_REPROVISION_TIMEOUT_SECS:-600}"
RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}"
# Slug MUST start with e2e- so sweep-stale-e2e-orgs.yml reaps any orphan this
# run leaks (lint_cleanup_traps.sh enforces the e2e-/rt-e2e- prefix for any
# staging tenant E2E; we honour it here too even though our filename isn't
# *staging*).
SLUG="e2e-rec-$(date +%Y%m%d)-${RUN_ID_SUFFIX}"
SLUG=$(echo "$SLUG" | tr '[:upper:]' '[:lower:]' | tr -cd 'a-z0-9-' | head -c 32)
log() { echo "[$(date +%H:%M:%S)] $*"; }
fail() { echo "[$(date +%H:%M:%S)] ❌ $*" >&2; exit 1; }
ok() { echo "[$(date +%H:%M:%S)] ✅ $*"; }
# Per-runtime model slug dispatch — shared with the full-saas harness.
# shellcheck disable=SC1091
# shellcheck source=lib/model_slug.sh
source "$(dirname "$0")/lib/model_slug.sh"
# AWS kill primitive + leak sweep (e2e_aws_region / e2e_ec2_instances_for_slug /
# e2e_terminate_instances / e2e_verify_no_ec2_leaks_for_slug).
# shellcheck disable=SC1091
# shellcheck source=lib/aws_leak_check.sh
source "$(dirname "$0")/lib/aws_leak_check.sh"
CURL_COMMON=(-sS --fail-with-body --max-time 30)
# ─── cleanup trap ───────────────────────────────────────────────────────
# Identical teardown contract to test_staging_full_saas.sh: delete the
# tenant (synchronous GDPR cascade), poll for the org row to disappear, then
# assert no slug-tagged EC2 survives. A leaked resource at teardown is a CI
# failure (exit 4). The trap is installed UP-FRONT so a mid-test failure
# (including a failed PRIMARY assertion) still cleans up.
CLEANUP_DONE=0
cleanup_org() {
# Capture upstream exit code IMMEDIATELY — must be the first statement in
# the trap, before any command (including the CLEANUP_DONE check) clobbers $?.
local entry_rc=$?
if [ "$CLEANUP_DONE" = "1" ]; then return 0; fi
CLEANUP_DONE=1
if [ "${E2E_KEEP_ORG:-0}" = "1" ]; then
log "E2E_KEEP_ORG=1 — skipping teardown. Manually delete $SLUG when done."
return 0
fi
log "🧹 Tearing down org $SLUG..."
# 120s curl budget for the synchronous DELETE cascade (EC2 terminate alone
# is 30-60s), then poll up to 60s for organizations.status='purged'/gone.
if curl "${CURL_COMMON[@]}" --max-time 120 -X DELETE "$CP_URL/cp/admin/tenants/$SLUG" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"confirm\":\"$SLUG\"}" >/dev/null 2>&1; then
ok "Teardown request accepted"
else
log "Teardown returned non-2xx (may already be gone)"
fi
local leak_count=1
local elapsed=0
while [ "$elapsed" -lt 60 ]; do
leak_count=$(curl "${CURL_COMMON[@]}" "$CP_URL/cp/admin/orgs" \
-H "Authorization: Bearer $ADMIN_TOKEN" 2>/dev/null \
| python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for o in d.get('orgs', []) if o.get('slug')=='$SLUG' and o.get('status') != 'purged'))" \
2>/dev/null || echo 1)
if [ "$leak_count" = "0" ]; then
break
fi
sleep 5
elapsed=$((elapsed + 5))
done
if [ "$leak_count" != "0" ]; then
echo "⚠️ LEAK: org $SLUG still present post-teardown after ${elapsed}s (count=$leak_count)" >&2
exit 4
fi
local aws_leak_rc=0
e2e_verify_no_ec2_leaks_for_slug "$SLUG" || aws_leak_rc=$?
if [ "$aws_leak_rc" != "0" ]; then
case "$aws_leak_rc" in
2) exit 2 ;;
*) exit 4 ;;
esac
fi
ok "Teardown clean — no orphan org or EC2 resources for $SLUG (${elapsed}s)"
# Normalize unexpected upstream exit codes to 1 — `set -e` propagates the
# raw exit code of the failing command (e.g. curl exits 22 under
# --fail-with-body), but this script's contract only emits {0,1,2,3,4}.
case "$entry_rc" in
0|1|2|3|4) ;;
*) exit 1 ;;
esac
}
trap cleanup_org EXIT INT TERM
# ─── 0. Preflight ───────────────────────────────────────────────────────
log "═══════════════════════════════════════════════════════════════════"
log " Staging reconciler-heals-terminated-instance E2E (core#2261)"
log " CP: $CP_URL"
log " Slug: $SLUG"
log " Runtime: $RUNTIME"
log " Online timeout: ${WORKSPACE_ONLINE_TIMEOUT_SECS}s"
log " PRIMARY (offline): ${RECONCILE_OFFLINE_TIMEOUT_SECS}s"
log " SECONDARY (reprov): ${REPROVISION_TIMEOUT_SECS}s"
log "═══════════════════════════════════════════════════════════════════"
log "0/6 Preflight: CP reachable?"
curl "${CURL_COMMON[@]}" "$CP_URL/health" >/dev/null || fail "CP health check failed"
ok "CP reachable"
admin_call() {
local method="$1"; shift
local path="$1"; shift
curl "${CURL_COMMON[@]}" -X "$method" "$CP_URL$path" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
"$@"
}
# ─── 1. Create org ──────────────────────────────────────────────────────
log "1/6 Creating org $SLUG via /cp/admin/orgs..."
CREATE_RESP=$(admin_call POST /cp/admin/orgs \
-d "{\"slug\":\"$SLUG\",\"name\":\"E2E $SLUG\",\"owner_user_id\":\"e2e-runner:$SLUG\"}")
echo "$CREATE_RESP" | python3 -m json.tool >/dev/null || fail "Org create returned non-JSON: $CREATE_RESP"
ORG_ID=$(echo "$CREATE_RESP" | python3 -c "import json,sys; print(json.load(sys.stdin).get('id',''))")
[ -z "$ORG_ID" ] && fail "Org create response missing 'id': $CREATE_RESP"
ok "Org created (id=$ORG_ID)"
# ─── 2. Wait for tenant provisioning ────────────────────────────────────
log "2/6 Waiting for tenant provisioning (up to ${PROVISION_TIMEOUT_SECS}s)..."
DEADLINE=$(( $(date +%s) + PROVISION_TIMEOUT_SECS ))
LAST_STATUS=""
while true; do
if [ "$(date +%s)" -gt "$DEADLINE" ]; then
fail "Tenant provisioning timed out after ${PROVISION_TIMEOUT_SECS}s (last: $LAST_STATUS)"
fi
LIST_JSON=$(admin_call GET /cp/admin/orgs 2>/dev/null || echo '{"orgs":[]}')
# /cp/admin/orgs exposes 'instance_status' (org_instances.status), NOT 'status'.
STATUS=$(echo "$LIST_JSON" | python3 -c "
import json, sys
d = json.load(sys.stdin)
for o in d.get('orgs', []):
if o.get('slug') == '$SLUG':
print(o.get('instance_status', ''))
sys.exit(0)
print('')
" 2>/dev/null || echo "")
if [ "$STATUS" != "$LAST_STATUS" ]; then
log " status → $STATUS"
LAST_STATUS="$STATUS"
fi
case "$STATUS" in
running) break ;;
failed)
log "── DIAGNOSTIC BURST (step 2 — tenant provisioning failed) ──"
echo "$LIST_JSON" | python3 -c "
import json, sys
d = json.load(sys.stdin)
for o in d.get('orgs', []):
if o.get('slug') == '$SLUG':
print(json.dumps(o, indent=2))
sys.exit(0)
print('(no org row found for slug=$SLUG — DB drift?)')
" 2>&1 | sed 's/^/ /'
log "── END DIAGNOSTIC ──"
# Tenant provisioning failures are a CP-side fault, not a reconciler
# regression — exit 3 (provisioning) to keep the signal honest.
echo "[$(date +%H:%M:%S)] ❌ Tenant provisioning failed for $SLUG (see diagnostic above)" >&2
exit 3
;;
*) sleep 15 ;;
esac
done
ok "Tenant provisioning complete"
# Derive tenant domain from CP hostname (same logic as the full-saas harness).
CP_HOST=$(echo "$CP_URL" | sed -E 's#^https?://##; s#/.*$##')
case "$CP_HOST" in
api.*) DERIVED_DOMAIN="${CP_HOST#api.}" ;;
staging-api.*) DERIVED_DOMAIN="staging.${CP_HOST#staging-api.}" ;;
*) DERIVED_DOMAIN="$CP_HOST" ;;
esac
TENANT_DOMAIN="${MOLECULE_TENANT_DOMAIN:-$DERIVED_DOMAIN}"
TENANT_URL="https://$SLUG.$TENANT_DOMAIN"
log " TENANT_URL=$TENANT_URL"
# ─── 3. Retrieve per-tenant admin token ────────────────────────────────
log "3/6 Fetching per-tenant admin token..."
TENANT_TOKEN_RESP=$(admin_call GET "/cp/admin/orgs/$SLUG/admin-token")
TENANT_TOKEN=$(echo "$TENANT_TOKEN_RESP" | python3 -c "import json,sys; print(json.load(sys.stdin).get('admin_token',''))" 2>/dev/null || echo "")
[ -z "$TENANT_TOKEN" ] && fail "Could not retrieve per-tenant admin token for $SLUG"
ok "Tenant admin token retrieved (len=${#TENANT_TOKEN})"
# Wait for tenant TLS / DNS propagation before any tenant API call.
log " Waiting for tenant TLS / DNS propagation..."
TLS_DEADLINE=$(( $(date +%s) + 15 * 60 ))
while true; do
if curl -sSfk --max-time 5 "$TENANT_URL/health" >/dev/null 2>&1; then
break
fi
if [ "$(date +%s)" -gt "$TLS_DEADLINE" ]; then
fail "Tenant URL never responded 2xx on /health within 15m"
fi
sleep 5
done
ok "Tenant reachable at $TENANT_URL"
tenant_call() {
local method="$1"; shift
local path="$1"; shift
# X-Molecule-Org-Id is REQUIRED — the tenant guard 404s anything without it
# (it does NOT 403, to hide tenant existence from org scanners).
curl "${CURL_COMMON[@]}" -X "$method" "$TENANT_URL$path" \
-H "Authorization: Bearer $TENANT_TOKEN" \
-H "X-Molecule-Org-Id: $ORG_ID" \
"$@"
}
# Helper: read a single field off GET /workspaces/<id>. Echoes '' on any
# error so callers can poll without `set -e` aborting on a transient blip.
ws_field() {
local wid="$1"; local field="$2"
tenant_call GET "/workspaces/$wid" 2>/dev/null \
| python3 -c "import json,sys; print(json.load(sys.stdin).get('$field') or '')" 2>/dev/null \
|| echo ""
}
# ─── 4. Provision ONE workspace ─────────────────────────────────────────
# Same secrets-injection priority chain as the full-saas harness so the
# FIRST boot reaches online. We never make a completion in this test (the
# whole exercise is instance-state, not the LLM), so an absent key is
# tolerable — but wiring the same keys keeps boot behaviour identical to the
# sibling and avoids a config path that only this test would exercise.
SECRETS_JSON='{}'
if [ -n "${E2E_MINIMAX_API_KEY:-}" ]; then
SECRETS_JSON=$(python3 -c "import json,os; print(json.dumps({'MINIMAX_API_KEY': os.environ['E2E_MINIMAX_API_KEY']}))")
elif [ -n "${E2E_ANTHROPIC_API_KEY:-}" ]; then
SECRETS_JSON=$(python3 -c "import json,os; print(json.dumps({'ANTHROPIC_API_KEY': os.environ['E2E_ANTHROPIC_API_KEY']}))")
elif [ -n "${E2E_OPENAI_API_KEY:-}" ]; then
SECRETS_JSON=$(python3 -c "
import json, os
k = os.environ['E2E_OPENAI_API_KEY']
print(json.dumps({
'OPENAI_API_KEY': k,
'OPENAI_BASE_URL': 'https://api.openai.com/v1',
'MODEL_PROVIDER': 'openai:gpt-4o',
'HERMES_INFERENCE_PROVIDER': 'custom',
'HERMES_CUSTOM_BASE_URL': 'https://api.openai.com/v1',
'HERMES_CUSTOM_API_KEY': k,
'HERMES_CUSTOM_API_MODE': 'chat_completions',
}))
")
fi
MODEL_SLUG=$(pick_model_slug "$RUNTIME")
log " MODEL_SLUG=$MODEL_SLUG"
log "4/6 Provisioning workspace (runtime=$RUNTIME)..."
WS_RESP=$(tenant_call POST /workspaces \
-H "Content-Type: application/json" \
-d "{\"name\":\"E2E Reconciler\",\"runtime\":\"$RUNTIME\",\"tier\":2,\"model\":\"$MODEL_SLUG\",\"secrets\":$SECRETS_JSON}")
WS_ID=$(echo "$WS_RESP" | python3 -c "import json,sys; print(json.load(sys.stdin)['id'])")
[ -z "$WS_ID" ] && fail "Workspace create response missing 'id': $WS_RESP"
log " WS_ID=$WS_ID"
# Wait for the workspace to reach status=online and capture its instance_id.
log " Waiting for workspace to reach status=online (up to $((WORKSPACE_ONLINE_TIMEOUT_SECS/60)) min)..."
ONLINE_DEADLINE=$(( $(date +%s) + WORKSPACE_ONLINE_TIMEOUT_SECS ))
ORIGINAL_INSTANCE_ID=""
WS_LAST_STATUS=""
while true; do
if [ "$(date +%s)" -gt "$ONLINE_DEADLINE" ]; then
WS_LAST_ERR=$(ws_field "$WS_ID" "last_sample_error")
fail "Workspace $WS_ID never reached status=online within ${WORKSPACE_ONLINE_TIMEOUT_SECS}s (last status=$WS_LAST_STATUS, err=$WS_LAST_ERR)"
fi
WS_STATUS=$(ws_field "$WS_ID" "status")
if [ "$WS_STATUS" != "$WS_LAST_STATUS" ]; then
log " $WS_ID$WS_STATUS"
WS_LAST_STATUS="$WS_STATUS"
fi
if [ "$WS_STATUS" = "online" ]; then
ORIGINAL_INSTANCE_ID=$(ws_field "$WS_ID" "instance_id")
if [ -n "$ORIGINAL_INSTANCE_ID" ]; then
break
fi
# online but instance_id not surfaced yet — keep polling briefly.
log " $WS_ID online but instance_id not populated yet — waiting"
fi
# 'failed' is transient on cold boot (bootstrap-watcher deadline vs heartbeat
# recovery, cp#245). Keep polling; only the deadline hard-fails.
sleep 10
done
ok "Workspace online (instance_id=$ORIGINAL_INSTANCE_ID)"
# ─── 5. Kill the EC2 ────────────────────────────────────────────────────
# Terminate the EXACT instance the workspace reported. Prefer the captured
# instance_id (precise — kills only this workspace's box); fall back to the
# slug-tag describe if the API didn't surface an id (shouldn't happen — we
# only break out of the online-wait once instance_id is non-empty).
log "5/6 KILLING the workspace EC2 to simulate an out-of-band termination..."
if ! e2e_aws_creds_available; then
fail "AWS CLI/creds unavailable — cannot terminate the EC2 to exercise the reconciler. Set AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY (the CI workflow wires these)."
fi
AWS_REGION_RESOLVED=$(e2e_aws_region)
if [ -n "$ORIGINAL_INSTANCE_ID" ]; then
log " Terminating $ORIGINAL_INSTANCE_ID in $AWS_REGION_RESOLVED (aws ec2 terminate-instances)..."
aws ec2 terminate-instances --region "$AWS_REGION_RESOLVED" --instance-ids "$ORIGINAL_INSTANCE_ID" >/dev/null \
|| fail "aws ec2 terminate-instances failed for $ORIGINAL_INSTANCE_ID"
KILLED_IDS="$ORIGINAL_INSTANCE_ID"
else
# Fallback path — find by slug tag and terminate.
log " instance_id was empty — falling back to slug-tag describe ($SLUG)..."
ROWS=$(e2e_ec2_instances_for_slug "$SLUG" 2>/dev/null || echo "")
KILLED_IDS=$(echo "$ROWS" | awk 'NF {print $1}' | sort -u | tr '\n' ' ')
[ -n "$KILLED_IDS" ] || fail "No slug-tagged EC2 found for $SLUG — nothing to terminate"
log " Terminating $KILLED_IDS in $AWS_REGION_RESOLVED..."
e2e_terminate_instances "$KILLED_IDS" || fail "terminate-instances failed for $KILLED_IDS"
fi
ok "Terminated EC2: $KILLED_IDS — reconciler should now detect the dead instance"
# ─── 6a. PRIMARY assertion — workspace leaves 'online' ─────────────────
# This is THE regression gate for core#2261/#2247. The reconciler runs every
# 60s in the tenant's workspace-server; when CPProvisioner.IsRunning returns a
# clean "not running" for the terminated EC2, onOffline flips the row off
# 'online'. A dead instance that keeps reading 'online' is exactly the bug.
log "6a/6 PRIMARY: asserting workspace leaves 'online' within ${RECONCILE_OFFLINE_TIMEOUT_SECS}s (reconciler heal-detection)..."
OFFLINE_DEADLINE=$(( $(date +%s) + RECONCILE_OFFLINE_TIMEOUT_SECS ))
LEFT_ONLINE=0
REC_LAST_STATUS=""
while true; do
if [ "$(date +%s)" -gt "$OFFLINE_DEADLINE" ]; then
break
fi
REC_STATUS=$(ws_field "$WS_ID" "status")
if [ "$REC_STATUS" != "$REC_LAST_STATUS" ]; then
log " $WS_ID status → ${REC_STATUS:-<empty>}"
REC_LAST_STATUS="$REC_STATUS"
fi
# Any non-online status (offline/provisioning/awaiting_agent/restarting/…)
# proves the reconciler acted. We deliberately don't pin the exact target
# status: onOffline flips offline AND kicks RestartByID, so the row may race
# straight into a provisioning/restarting state — all of which are "no longer
# falsely online".
if [ -n "$REC_STATUS" ] && [ "$REC_STATUS" != "online" ]; then
LEFT_ONLINE=1
ok "PRIMARY held — workspace left 'online' (now '$REC_STATUS') after EC2 termination"
break
fi
sleep 10
done
if [ "$LEFT_ONLINE" != "1" ]; then
fail "PRIMARY FAILED (core#2261 regression): workspace $WS_ID still reads status=online ${RECONCILE_OFFLINE_TIMEOUT_SECS}s after its EC2 ($KILLED_IDS) was terminated. The reconciler did NOT detect the dead instance — a terminated EC2 is masquerading as a healthy workspace."
fi
# ─── 6b. SECONDARY assertion — auto-reprovision (best-effort) ──────────
# The onOffline → RestartByID existing-volume heal should bring the workspace
# back to 'online' on a NEW instance_id. This is best-effort: a full EC2 cold
# reprovision is a multi-minute path that shares the same boot-flake surface
# as the initial provision. If it doesn't finish within the bound we LOG it
# clearly but DO NOT fail — the PRIMARY assertion above is the gate.
#
# FUTURE TIGHTENING (deliberately one edit away): once this reprovision path
# is proven reliable on staging, promote the `log "SECONDARY ..."` soft-miss
# below to a `fail ...` so a stuck reprovision becomes a hard gate.
log "6b/6 SECONDARY (best-effort): asserting auto-reprovision to online with a NEW instance_id within ${REPROVISION_TIMEOUT_SECS}s..."
REPROV_DEADLINE=$(( $(date +%s) + REPROVISION_TIMEOUT_SECS ))
REPROV_OK=0
REPROV_LAST_STATUS=""
NEW_INSTANCE_ID=""
while true; do
if [ "$(date +%s)" -gt "$REPROV_DEADLINE" ]; then
break
fi
RP_STATUS=$(ws_field "$WS_ID" "status")
if [ "$RP_STATUS" != "$REPROV_LAST_STATUS" ]; then
log " $WS_ID status → ${RP_STATUS:-<empty>}"
REPROV_LAST_STATUS="$RP_STATUS"
fi
if [ "$RP_STATUS" = "online" ]; then
NEW_INSTANCE_ID=$(ws_field "$WS_ID" "instance_id")
if [ -n "$NEW_INSTANCE_ID" ] && [ "$NEW_INSTANCE_ID" != "$ORIGINAL_INSTANCE_ID" ]; then
REPROV_OK=1
break
fi
# online again but instance_id either not surfaced yet or still the old
# (terminated) id — keep polling until the reprovision swaps it.
fi
sleep 15
done
if [ "$REPROV_OK" = "1" ]; then
ok "SECONDARY held — auto-reprovisioned to online on NEW instance_id=$NEW_INSTANCE_ID (was $ORIGINAL_INSTANCE_ID)"
else
# Soft-miss — see FUTURE TIGHTENING note above. PRIMARY is the gate.
log "⚠️ SECONDARY not satisfied within ${REPROVISION_TIMEOUT_SECS}s (status=${REPROV_LAST_STATUS:-<empty>}, instance_id=${NEW_INSTANCE_ID:-<none>}, original=$ORIGINAL_INSTANCE_ID). NOT failing — the PRIMARY heal-detection assertion is the gate; reprovision is a slower, flakier cold path. Promote this to a hard fail once it's proven reliable."
fi
ok "Reconciler live E2E PASSED — PRIMARY heal-detection held (SECONDARY: $([ "$REPROV_OK" = "1" ] && echo "held" || echo "soft-miss, logged"))"
# Teardown runs via the EXIT trap.
+19
View File
@@ -337,6 +337,25 @@ func main() {
})
}
// CP-mode instance-state reconciler — authoritative EC2-liveness pass
// for SaaS workspaces (core#2261). Every other liveness sweep keys off
// a PROXY (Redis TTL, agent heartbeat, local Docker, or
// runtime='external'); a SaaS claude-code workspace whose EC2 was
// terminated/stopped falls through ALL of them and stays status='online'
// pointing at a dead instance_id forever (root cause: core#2247). This
// loop asks the ONE authoritative question the others lack —
// cpProv.IsRunning (CP DescribeInstances-equivalent) — for each online
// SaaS row, and on a CLEAN "not running" feeds it into the SAME
// onWorkspaceOffline closure the other sweeps use (status flip +
// RestartByID reprovision, existing volume). Fail-safe: IsRunning is
// (true, err) on any transient error, so a CP blip never flips a healthy
// workspace.
if cpProv != nil {
go supervised.RunWithRecover(ctx, "cp-instance-reconciler", func(c context.Context) {
registry.StartCPInstanceReconciler(c, cpProv, onWorkspaceOffline, 60*time.Second)
})
}
// Pending-uploads GC sweep — deletes acked rows past their retention
// window plus unacked rows past expires_at. Without this the
// pending_uploads table grows unbounded; even with the 24h hard TTL,
+37 -10
View File
@@ -380,12 +380,18 @@ func (h *ActivityHandler) List(c *gin.Context) {
// "row not found" — both indicate the cursor is no longer usable for
// this caller, no information leak.
var cursorTime time.Time
var cursorSeq int64
usingCursor := false
if sinceID != "" {
// Resolve BOTH ordering-key components of the cursor row. The feed is
// ordered by (created_at, seq), so the strictly-after filter below must
// compare the full tuple — comparing created_at alone silently drops a
// row written in the SAME microsecond as the cursor row (the boundary
// skip the since_id E2E intermittently tripped over).
err := db.DB.QueryRowContext(c.Request.Context(),
`SELECT created_at FROM activity_logs WHERE id = $1 AND workspace_id = $2`,
`SELECT created_at, seq FROM activity_logs WHERE id = $1 AND workspace_id = $2`,
sinceID, workspaceID,
).Scan(&cursorTime)
).Scan(&cursorTime, &cursorSeq)
if errors.Is(err, sql.ErrNoRows) {
c.JSON(http.StatusGone, gin.H{
"error": "since_id cursor not found (row may have been pruned or belongs to a different workspace); omit since_id to reset",
@@ -492,10 +498,20 @@ func (h *ActivityHandler) List(c *gin.Context) {
argIdx++
}
if usingCursor {
// Strictly after — never replay the cursor row itself.
query += fmt.Sprintf(" AND "+actCol+"created_at > $%d", argIdx)
args = append(args, cursorTime)
argIdx++
// Strictly after the cursor on the FULL ordering key (created_at, seq).
// Tuple comparison: a row is "after" the cursor if its created_at is
// later, OR it shares the cursor's created_at but has a higher seq.
// This (a) never replays the cursor row itself and (b) — unlike a bare
// `created_at > cursor` — never drops a row written in the same
// microsecond as the cursor row. Expressed as the expanded boolean
// rather than a row-value `(created_at, seq) > ($t, $s)` so it composes
// with the actCol qualifier prefix and the existing placeholder/arg
// builder cleanly.
query += fmt.Sprintf(
" AND ("+actCol+"created_at > $%d OR ("+actCol+"created_at = $%d AND "+actCol+"seq > $%d))",
argIdx, argIdx, argIdx+1)
args = append(args, cursorTime, cursorSeq)
argIdx += 2
}
// Polling clients (since_id) need oldest-first within the new window so
@@ -503,9 +519,13 @@ func (h *ActivityHandler) List(c *gin.Context) {
// since_id) keeps DESC — that's the canvas/UI shape and changing it
// would surprise existing callers.
if usingCursor {
query += fmt.Sprintf(" ORDER BY "+actCol+"created_at ASC LIMIT $%d", argIdx)
// (created_at, seq) ASC — seq is the deterministic tiebreaker for rows
// sharing a microsecond-collided created_at. Replays in recorded order.
query += fmt.Sprintf(" ORDER BY "+actCol+"created_at ASC, "+actCol+"seq ASC LIMIT $%d", argIdx)
} else {
query += fmt.Sprintf(" ORDER BY "+actCol+"created_at DESC LIMIT $%d", argIdx)
// (created_at, seq) DESC — same tiebreaker, newest-first for the
// canvas/recent-feed shape.
query += fmt.Sprintf(" ORDER BY "+actCol+"created_at DESC, "+actCol+"seq DESC LIMIT $%d", argIdx)
}
args = append(args, limit)
@@ -680,7 +700,8 @@ func buildSessionSearchQuery(workspaceID, query string, limit int) (string, []in
COALESCE(status, '') AS status,
request_body,
response_body,
created_at
created_at,
seq
FROM activity_logs
WHERE workspace_id = $1
)
@@ -702,7 +723,13 @@ func buildSessionSearchQuery(workspaceID, query string, limit int) (string, []in
args = append(args, "%"+query+"%")
}
sqlQuery += ` ORDER BY created_at DESC LIMIT $` + strconv.Itoa(len(args)+1)
// Deterministic order: created_at alone is not unique (same-microsecond
// rows), so tie-break on the monotonic seq — same fix as the since_id feed
// (§ No flakes: no unstable sorts, even on an unused surface). `seq` is
// projected through the session_items CTE above so this outer ORDER BY can
// reference it — the outer SELECT can only sort on the CTE's output columns,
// not on activity_logs directly.
sqlQuery += ` ORDER BY created_at DESC, seq DESC LIMIT $` + strconv.Itoa(len(args)+1)
args = append(args, limit)
return sqlQuery, args
}
@@ -0,0 +1,211 @@
//go:build integration
// +build integration
// activity_seq_backfill_integration_test.go — REAL Postgres proof of the
// invariant the 20260604000000_activity_logs_seq.up.sql migration guarantees:
// every activity_logs row carries a NON-NULL `seq`, both for rows that existed
// before the migration ran (assigned during the ALTER TABLE rewrite) and for
// rows created afterward via the normal INSERT path (assigned by the IDENTITY
// default). This is the coverage CR2 (#2339 review) correctly flagged as
// missing on PR #2258.
//
// WHY THIS IS A SEPARATE TEST from activity_since_id_ordering_integration_test.go:
// that test pins the *ordering* contract (same-microsecond rows come back in a
// deterministic (created_at, seq) order). THIS test pins the *backfill* contract
// — that `seq` is never NULL — and the consequence the reviewer doubted: a
// pre-existing/backfilled row is usable as a since_id cursor because its seq is
// non-null, so the tuple cursor `(created_at, seq)` the handler builds is well
// defined for it.
//
// EMPIRICAL BASIS (PostgreSQL 16.13, the prod PG version):
// - `ALTER TABLE activity_logs ADD COLUMN seq BIGINT GENERATED BY DEFAULT AS
// IDENTITY` rewrites the table and assigns seq to EXISTING rows in physical
// table-scan order — they are NON-NULL, not left NULL as the review claimed.
// - The identity sequence then advances ABOVE max(seq), so the next INSERT
// that omits seq gets max+1 with no collision.
// Run against any Postgres 15/16 the integration harness boots — the property
// holds on both.
//
// Run with (same harness as activity_delegation_a2a_integration_test.go):
//
// docker run --rm -d --name pg-integration \
// -e POSTGRES_PASSWORD=test -e POSTGRES_DB=molecule \
// -p 55432:5432 postgres:15-alpine
// sleep 4
// # apply migrations (incl. 20260604000000_activity_logs_seq.up.sql) then:
// INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \
// go test -tags=integration ./internal/handlers/ -run Integration_ActivityLogs_Seq
//
// WATCH-IT-FAIL: if `seq` were left nullable / un-backfilled (the failure mode
// the reviewer hypothesized), the NULL-count assertion in _NoNull trips, and
// the since_id-on-a-backfilled-row case in _SinceIDOnBackfilledRow trips because
// the handler cannot read a non-null seq for the cursor row. With the migration
// as written both are green every run.
package handlers
import (
"context"
"encoding/json"
"net/http"
"testing"
"time"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db"
"github.com/gin-gonic/gin"
)
// TestIntegration_ActivityLogs_SeqBackfill_NoNull pins the core migration
// invariant: AFTER migrations have run, NO activity_logs row may have a NULL
// seq — neither rows that the seedActivityRowAt path inserts (IDENTITY default)
// nor any row the schema carries. It also proves the IDENTITY sequence keeps
// producing distinct, non-null seq for fresh inserts (no collision, no NULL).
//
// This is the assertion that would FAIL if the ALTER had left existing rows
// with NULL seq (the reviewer's claim) — table-scan backfill makes it pass.
func TestIntegration_ActivityLogs_SeqBackfill_NoNull(t *testing.T) {
conn := integrationDB_ActivityDelegationA2A(t)
_ = conn
wsID := seedWorkspace(t, conn, "test-2151-seq-backfill-nonull")
// Insert several rows via the normal path. seq is left to the IDENTITY
// default — exactly how production writes activity_logs.
t0 := time.Date(2026, 6, 4, 9, 0, 0, 0, time.UTC)
const n = 5
ids := make([]string, 0, n)
for i := 0; i < n; i++ {
ids = append(ids, seedActivityRowAt(t, wsID, "backfill-row", t0.Add(time.Duration(i)*time.Second)))
}
// (a) No row in this workspace may have a NULL seq. If the column were
// un-backfilled / nullable this is > 0 and the test fails.
var nullCount int
if err := db.DB.QueryRowContext(context.Background(),
`SELECT COUNT(*) FROM activity_logs WHERE workspace_id = $1 AND seq IS NULL`,
wsID,
).Scan(&nullCount); err != nil {
t.Fatalf("null-seq count query: %v", err)
}
if nullCount != 0 {
t.Fatalf("found %d activity_logs rows with NULL seq — migration did NOT backfill/assign seq", nullCount)
}
// Belt-and-suspenders: the GLOBAL invariant (no NULL seq anywhere in the
// table) is what the migration actually guarantees. Assert it too, so a
// regression that nulls seq for rows written by some other path is caught.
var globalNull int
if err := db.DB.QueryRowContext(context.Background(),
`SELECT COUNT(*) FROM activity_logs WHERE seq IS NULL`,
).Scan(&globalNull); err != nil {
t.Fatalf("global null-seq count query: %v", err)
}
if globalNull != 0 {
t.Fatalf("found %d activity_logs rows table-wide with NULL seq — seq must be non-null for every row", globalNull)
}
// (b) The IDENTITY sequence yields DISTINCT, monotonic, non-null seq for
// the rows we just inserted (proves the normal insert path gets a real seq,
// and that the sequence advanced past any backfilled max instead of
// colliding). We read them back in insert order and require strictly
// increasing, all-non-null seq.
rows, err := db.DB.QueryContext(context.Background(),
`SELECT seq FROM activity_logs WHERE workspace_id = $1 ORDER BY created_at ASC, seq ASC`,
wsID,
)
if err != nil {
t.Fatalf("read-back seq query: %v", err)
}
defer rows.Close()
var seqs []int64
for rows.Next() {
var s *int64 // pointer so a NULL would scan as nil rather than 0
if err := rows.Scan(&s); err != nil {
t.Fatalf("scan seq: %v", err)
}
if s == nil {
t.Fatal("a freshly-inserted activity_logs row has NULL seq — IDENTITY default did not fire")
}
seqs = append(seqs, *s)
}
if err := rows.Err(); err != nil {
t.Fatalf("rows err: %v", err)
}
if len(seqs) != n {
t.Fatalf("expected %d rows, read back %d", n, len(seqs))
}
for i := 1; i < len(seqs); i++ {
if seqs[i] <= seqs[i-1] {
t.Fatalf("seq not strictly increasing in insert order: %v (IDENTITY collision / reuse)", seqs)
}
}
}
// TestIntegration_ActivityLogs_SeqBackfill_SinceIDOnBackfilledRow pins the
// consequence the reviewer doubted: a row whose seq came from the migration /
// IDENTITY (i.e. NOT explicitly set by the caller) is usable as a since_id
// cursor, and a SECOND row sharing its exact created_at microsecond is returned
// (not dropped). This proves the handler's (created_at, seq) tuple cursor
// resolves a same-timestamp boundary that a created_at-only cursor would drop,
// AND that the cursor row's seq is non-null (else the handler could not build
// the tuple at all).
//
// Distinct from _BoundaryRowSameMicrosecondNotSkipped in the ordering test:
// here the explicit angle under test is "the cursor row's seq is a
// migration/IDENTITY-assigned (backfilled-style) value, non-null, and the
// handler uses it" — i.e. the backfill behavior is what makes the boundary
// resolution work, pinned head-on.
func TestIntegration_ActivityLogs_SeqBackfill_SinceIDOnBackfilledRow(t *testing.T) {
conn := integrationDB_ActivityDelegationA2A(t)
_ = conn
wsID := seedWorkspace(t, conn, "test-2151-seq-backfill-sinceid")
tSame := time.Date(2026, 6, 4, 10, 0, 0, 0, time.UTC)
// Cursor row: seq comes purely from the IDENTITY default (never set by
// the caller) — the same assignment mechanism the migration uses to
// backfill pre-existing rows. The "next" row shares the exact created_at
// microsecond and is inserted afterward, so it gets a strictly higher seq.
cursorID := seedActivityRowAt(t, wsID, "sinceid-cursor", tSame)
nextID := seedActivityRowAt(t, wsID, "sinceid-next-same-us", tSame)
// Prove the precondition the reviewer doubted: the cursor row's seq is
// NON-NULL, so the handler can read it to build the (created_at, seq)
// tuple. If it were NULL the handler's cursor lookup would yield a NULL
// seq and the strictly-after tuple comparison would mis-behave.
var cursorSeq *int64
if err := db.DB.QueryRowContext(context.Background(),
`SELECT seq FROM activity_logs WHERE id = $1`, cursorID,
).Scan(&cursorSeq); err != nil {
t.Fatalf("read cursor seq: %v", err)
}
if cursorSeq == nil {
t.Fatal("cursor row has NULL seq — a since_id cursor on a backfilled-style row would be unusable")
}
h := NewActivityHandler(nil)
c, w := newTestGinContext()
c.Params = gin.Params{{Key: "id", Value: wsID}}
q := c.Request.URL.Query()
q.Set("since_id", cursorID)
q.Set("type", "a2a_receive")
q.Set("limit", "10")
c.Request.URL.RawQuery = q.Encode()
h.List(c)
if w.Code != http.StatusOK {
t.Fatalf("List returned %d, want 200: %s", w.Code, w.Body.String())
}
var resp []map[string]interface{}
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("unmarshal: %v", err)
}
// Exactly the one same-microsecond row after the cursor — present (not
// dropped by a strict created_at-only filter) and the cursor itself
// excluded (strictly-after on the full tuple).
if len(resp) != 1 {
t.Fatalf("same-microsecond row after backfilled-style cursor dropped: expected 1 row, got %d: %+v",
len(resp), resp)
}
if got, _ := resp[0]["id"].(string); got != nextID {
t.Fatalf("expected boundary row id %s, got %s", nextID, got)
}
}
@@ -0,0 +1,162 @@
//go:build integration
// +build integration
// activity_since_id_ordering_integration_test.go — REAL Postgres proof that
// the poll-mode since_id activity feed (#2339) is DETERMINISTICALLY ordered
// even when multiple rows collide on the same created_at microsecond.
//
// This is the test that the original bug report mis-labeled a "flake".
// sqlmock cannot catch it: sqlmock returns rows in the order the test stuffs
// them, so it can never reveal a non-deterministic ORDER BY. Only a real
// planner over real same-created_at rows exposes it.
//
// Run with (same harness as activity_delegation_a2a_integration_test.go):
//
// docker run --rm -d --name pg-integration \
// -e POSTGRES_PASSWORD=test -e POSTGRES_DB=molecule \
// -p 55432:5432 postgres:15-alpine
// sleep 4
// # apply migrations (incl. 20260604000000_activity_logs_seq.up.sql) then:
// INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \
// go test -tags=integration ./internal/handlers/ -run Integration_SinceID
//
// WATCH-IT-FAIL: against the pre-fix handler (ORDER BY created_at only, no
// seq tiebreaker, and `created_at > cursor` strict) this test is unstable —
// the equal-created_at rows come back in arbitrary planner order so the
// ordered-id assertion fails intermittently, and the same-microsecond
// boundary row is dropped so the count assertion fails. With the fix
// (ORDER BY created_at, seq + tuple cursor) it is green every run.
package handlers
import (
"context"
"encoding/json"
"net/http"
"testing"
"time"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db"
"github.com/gin-gonic/gin"
)
// seedActivityRowAt inserts one activity_logs row with an explicit created_at
// (so the test can force microsecond-equal collisions) and a unique summary;
// returns the generated id. seq is left to the IDENTITY default — Postgres
// assigns it in INSERT order, which is the deterministic tiebreaker under test.
// db.DB has been hot-swapped to the integration connection by
// integrationDB_ActivityDelegationA2A(t) in the calling test.
func seedActivityRowAt(t *testing.T, wsID, summary string, createdAt time.Time) string {
t.Helper()
var id string
err := db.DB.QueryRowContext(context.Background(), `
INSERT INTO activity_logs (workspace_id, activity_type, summary, status, created_at)
VALUES ($1, 'a2a_receive', $2, 'ok', $3)
RETURNING id
`, wsID, summary, createdAt).Scan(&id)
if err != nil {
t.Fatalf("seedActivityRowAt(%q): %v", summary, err)
}
return id
}
// TestIntegration_SinceID_StableOrderingSameMicrosecond proves the feed is
// deterministic when rows share a created_at, AND that the same-microsecond
// boundary row immediately after the cursor is NOT dropped.
func TestIntegration_SinceID_StableOrderingSameMicrosecond(t *testing.T) {
conn := integrationDB_ActivityDelegationA2A(t)
_ = conn
wsID := seedWorkspace(t, conn, "test-2151-sinceid-ordering")
// One earlier row to serve as the cursor (the "last processed" row).
tCursor := time.Date(2026, 6, 4, 12, 0, 0, 0, time.UTC)
cursorID := seedActivityRowAt(t, wsID, "cursor-row", tCursor)
// Three rows that ALL collide on the exact same created_at microsecond,
// inserted in a known order. Pre-fix, ORDER BY created_at alone returns
// these in arbitrary planner order.
tEqual := time.Date(2026, 6, 4, 12, 0, 1, 0, time.UTC)
idA := seedActivityRowAt(t, wsID, "equal-A", tEqual)
idB := seedActivityRowAt(t, wsID, "equal-B", tEqual)
idCc := seedActivityRowAt(t, wsID, "equal-C", tEqual)
wantOrder := []string{idA, idB, idCc}
// Drive the handler exactly as a polling client would.
h := NewActivityHandler(nil)
c, w := newTestGinContext()
c.Params = gin.Params{{Key: "id", Value: wsID}}
q := c.Request.URL.Query()
q.Set("since_id", cursorID)
q.Set("type", "a2a_receive")
q.Set("limit", "10")
c.Request.URL.RawQuery = q.Encode()
h.List(c)
if w.Code != http.StatusOK {
t.Fatalf("List returned %d, want 200: %s", w.Code, w.Body.String())
}
var resp []map[string]interface{}
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("unmarshal: %v", err)
}
// All three equal-created_at rows must be present (boundary not dropped)
// and the cursor row itself must be excluded (strictly-after).
if len(resp) != len(wantOrder) {
t.Fatalf("expected %d rows after cursor (the 3 equal-created_at rows), got %d: %+v",
len(wantOrder), len(resp), resp)
}
gotOrder := make([]string, len(resp))
for i, row := range resp {
idVal, _ := row["id"].(string)
gotOrder[i] = idVal
}
for i := range wantOrder {
if gotOrder[i] != wantOrder[i] {
t.Fatalf("non-deterministic ordering: got id order %v, want %v (seq tiebreaker not applied)",
gotOrder, wantOrder)
}
}
}
// TestIntegration_SinceID_BoundaryRowSameMicrosecondNotSkipped isolates the
// cursor-boundary bug: a row written in the SAME microsecond as the cursor
// row (but with a higher seq) must still be returned. Pre-fix the strict
// `created_at > cursor` filter silently dropped it.
func TestIntegration_SinceID_BoundaryRowSameMicrosecondNotSkipped(t *testing.T) {
conn := integrationDB_ActivityDelegationA2A(t)
_ = conn
wsID := seedWorkspace(t, conn, "test-2151-sinceid-boundary")
tSame := time.Date(2026, 6, 4, 13, 0, 0, 0, time.UTC)
// Cursor row and the next row share the exact same created_at; the next
// row is inserted afterwards so it gets a higher seq.
cursorID := seedActivityRowAt(t, wsID, "boundary-cursor", tSame)
nextID := seedActivityRowAt(t, wsID, "boundary-next-same-us", tSame)
h := NewActivityHandler(nil)
c, w := newTestGinContext()
c.Params = gin.Params{{Key: "id", Value: wsID}}
q := c.Request.URL.Query()
q.Set("since_id", cursorID)
q.Set("type", "a2a_receive")
q.Set("limit", "10")
c.Request.URL.RawQuery = q.Encode()
h.List(c)
if w.Code != http.StatusOK {
t.Fatalf("List returned %d, want 200: %s", w.Code, w.Body.String())
}
var resp []map[string]interface{}
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("unmarshal: %v", err)
}
if len(resp) != 1 {
t.Fatalf("same-microsecond boundary row dropped: expected exactly the 1 next row, got %d rows: %+v",
len(resp), resp)
}
if got, _ := resp[0]["id"].(string); got != nextID {
t.Fatalf("expected boundary row id %s, got %s", nextID, got)
}
}
@@ -26,17 +26,21 @@ func TestActivityHandler_SinceID_ReturnsNewerASC(t *testing.T) {
cursorID := "act-cursor-42"
cursorTime := time.Date(2026, 4, 30, 5, 0, 0, 0, time.UTC)
cursorSeq := int64(42)
// Step 1: cursor lookup — must include workspace_id scope so a UUID
// from another workspace can't be used.
mock.ExpectQuery(`SELECT created_at FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
// from another workspace can't be used. Now resolves BOTH ordering-key
// components (created_at, seq) so the strictly-after filter can compare
// the full tuple.
mock.ExpectQuery(`SELECT created_at, seq FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
WithArgs(cursorID, "ws-1").
WillReturnRows(sqlmock.NewRows([]string{"created_at"}).AddRow(cursorTime))
WillReturnRows(sqlmock.NewRows([]string{"created_at", "seq"}).AddRow(cursorTime, cursorSeq))
// Step 2: main query with the cursor's created_at as a > filter,
// ASC ordering. Args: workspace_id, cursorTime, limit.
// Step 2: main query with the cursor's (created_at, seq) as a tuple
// strictly-after filter, (created_at, seq) ASC ordering.
// Args: workspace_id, cursorTime, cursorSeq, limit.
mock.ExpectQuery("SELECT id, workspace_id, activity_type").
WithArgs("ws-1", cursorTime, 100).
WithArgs("ws-1", cursorTime, cursorSeq, 100).
WillReturnRows(newActivityRows())
broadcaster := newTestBroadcaster()
@@ -64,7 +68,7 @@ func TestActivityHandler_SinceID_ReturnsNewerASC(t *testing.T) {
func TestActivityHandler_SinceID_CursorNotFound_410(t *testing.T) {
mock := setupTestDB(t)
mock.ExpectQuery(`SELECT created_at FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
mock.ExpectQuery(`SELECT created_at, seq FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
WithArgs("act-gone", "ws-1").
WillReturnError(sql.ErrNoRows)
@@ -96,7 +100,7 @@ func TestActivityHandler_SinceID_CrossWorkspaceCursor_410(t *testing.T) {
// Cursor exists in DB but the WHERE workspace_id = $2 filter excludes
// it — sqlmock returns no rows, which is what Postgres would do.
mock.ExpectQuery(`SELECT created_at FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
mock.ExpectQuery(`SELECT created_at, seq FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
WithArgs("act-other-ws", "ws-1").
WillReturnError(sql.ErrNoRows)
@@ -120,20 +124,23 @@ func TestActivityHandler_SinceID_CrossWorkspaceCursor_410(t *testing.T) {
// TestActivityHandler_SinceID_CombinedWithSinceSecs: both filters apply
// together (AND). Argument order in the main query: workspace_id,
// since_secs, cursorTime, limit. Sanity-checks the placeholder index
// arithmetic in the query builder.
// since_secs, cursorTime, cursorSeq, limit. Sanity-checks the placeholder
// index arithmetic in the query builder (the cursor now binds TWO args —
// the (created_at, seq) tuple — so since_secs no longer shifts the tail by
// one but by two).
func TestActivityHandler_SinceID_CombinedWithSinceSecs(t *testing.T) {
mock := setupTestDB(t)
cursorID := "act-c"
cursorTime := time.Date(2026, 4, 30, 4, 0, 0, 0, time.UTC)
cursorSeq := int64(7)
mock.ExpectQuery(`SELECT created_at FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
mock.ExpectQuery(`SELECT created_at, seq FROM activity_logs WHERE id = \$1 AND workspace_id = \$2`).
WithArgs(cursorID, "ws-1").
WillReturnRows(sqlmock.NewRows([]string{"created_at"}).AddRow(cursorTime))
WillReturnRows(sqlmock.NewRows([]string{"created_at", "seq"}).AddRow(cursorTime, cursorSeq))
mock.ExpectQuery("SELECT id, workspace_id, activity_type").
WithArgs("ws-1", 600, cursorTime, 100).
WithArgs("ws-1", 600, cursorTime, cursorSeq, 100).
WillReturnRows(newActivityRows())
broadcaster := newTestBroadcaster()
@@ -24,13 +24,23 @@ import (
// validateRegisteredModelForRuntime reports whether (runtime, model) is
// selectable per the provider registry. Returns:
//
// (true, "") — allowed: model is registered for this runtime, OR the
// runtime is not in the registry (fail-open), OR model=="".
// (false, reason) — rejected: the runtime IS registered but the model is not
// in its native ModelsForRuntime set.
// (true, "") — allowed: model is on the runtime's platform menu
// (ModelsForRuntime) OR DeriveProvider(runtime, model)
// RESOLVES a native provider (the cp#529 routability-aware
// BYOK path), OR the runtime is not in the registry
// (fail-open), OR model=="".
// (false, reason) — rejected: the runtime IS registered, the model is not on
// its platform menu, AND no native provider prefix-owns it
// (genuinely unroutable).
//
// model=="" is allowed here: the MODEL_REQUIRED gate owns the empty-model case,
// so this validator must not double-reject it.
//
// ROUTABILITY-AWARE (cp#529, CTO Option C): the final predicate is an OR —
// `model ∈ ModelsForRuntime(runtime)` OR `DeriveProvider(runtime, model, nil)`
// resolves. The platform menu carries platform-billed ids; the DeriveProvider
// path covers BYOK ids that prefix-match a name-only native arm (no platform
// billing). The drift checker in molecule-controlplane mirrors this exact OR.
func validateRegisteredModelForRuntime(runtime, model string) (bool, string) {
model = strings.TrimSpace(model)
if model == "" {
@@ -52,6 +62,24 @@ func validateRegisteredModelForRuntime(runtime, model string) (bool, string) {
return true, ""
}
}
// ROUTABILITY-AWARE allow path (cp#529, CTO-approved Option C). The model is
// NOT on the runtime's platform menu (ModelsForRuntime) — but a model can be
// legitimately SELECTABLE without being a platform-menu id: a BYOK id whose
// prefix matches one of the runtime's NATIVE provider arms (a name-only arm
// added in providers.yaml) resolves to a concrete provider via DeriveProvider
// even though it carries no platform billing. Allow it iff DeriveProvider
// resolves a provider for (runtime, model). A genuinely-unroutable id (no
// native provider prefix-owns it) still falls through to the 422 below.
//
// BILLING GUARDRAIL: only CONFIRMED-NON-PLATFORM (BYOK) providers are wired as
// name-only arms in providers.yaml (never platform/anthropic-*/openai-*/
// moonshot/minimax/google/vertex), so a DeriveProvider-resolved id reached by
// THIS path can never bill the platform's key for a customer's model. The
// platform-menu ids that DO carry platform billing are already allowed by the
// exact-membership loop above; this path only ever resolves to a BYOK arm.
if _, derr := m.DeriveProvider(runtime, model, nil); derr == nil {
return true, ""
}
return false, fmt.Sprintf(
"model %q is not a registered model for runtime %q; pick one of the runtime's registered models (provider-registry SSOT, internal#718)",
model, runtime)
@@ -79,6 +79,50 @@ func TestValidateRegisteredModelForRuntime(t *testing.T) {
model: "",
wantOK: true,
},
// ---- cp#529 routability-aware allow path -------------------------------
{
// BYOK passthrough id: NOT on hermes's platform menu, but the
// openrouter name-only native arm prefix-owns it → DeriveProvider
// resolves → ALLOWED (no platform billing — openrouter is BYOK).
name: "byok_passthrough_routable_now_allowed",
runtime: "hermes",
model: "openrouter/anthropic/claude-3.5-sonnet",
wantOK: true,
},
{
// BYOK namespaced vendor id: deepseek's widened ^deepseek[-:/]
// matches the vendor/ form on a name-only hermes arm → allowed.
name: "byok_namespaced_vendor_routable_now_allowed",
runtime: "hermes",
model: "deepseek/deepseek-chat",
wantOK: true,
},
{
// claude-code bare GLM- BYOK id: zai name-only arm + (?i)^(glm-|…)
// matches → DeriveProvider resolves → allowed.
name: "claude_code_bare_glm_byok_routable_now_allowed",
runtime: "claude-code",
model: "GLM-4.6",
wantOK: true,
},
{
// Genuinely UNROUTABLE id: no native hermes arm prefix-owns bare
// gpt-4o (the platform-shared openai vendor is NOT wired into hermes
// — billing guardrail), so DeriveProvider errors → still 422.
name: "genuinely_unroutable_still_rejected",
runtime: "hermes",
model: "gpt-4o",
wantOK: false,
},
{
// A namespaced vendor id NOW routable on hermes via the dedicated
// byok-openai provider (cp#529 BYOK-vendor arms): routes with the
// tenant's OPENAI_API_KEY → BYOK billing, never the platform key.
name: "byok_openai_namespaced_routable_now_allowed",
runtime: "hermes",
model: "openai/gpt-4o",
wantOK: true,
},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
@@ -109,58 +153,58 @@ func TestValidateDerivedProviderInRegistry(t *testing.T) {
// provider that IS in the providers list. These are the live corpus
// entries; the test pins the registry-consistency invariant.
{
name: "claude_code_anthropic_api_native",
name: "claude_code_anthropic_api_native",
runtime: "claude-code",
model: "claude-sonnet-4-6",
wantOK: true,
wantOK: true,
},
{
name: "claude_code_kimi_coding_native",
name: "claude_code_kimi_coding_native",
runtime: "claude-code",
model: "kimi-for-coding",
wantOK: true,
wantOK: true,
},
{
name: "claude_code_minimax_native",
name: "claude_code_minimax_native",
runtime: "claude-code",
model: "MiniMax-M2.7",
wantOK: true,
wantOK: true,
},
{
name: "claude_code_platform_namespaced",
name: "claude_code_platform_namespaced",
runtime: "claude-code",
model: "moonshot/kimi-k2.6",
wantOK: true,
wantOK: true,
},
{
name: "codex_openai_subscription_default_arm",
name: "codex_openai_subscription_default_arm",
runtime: "codex",
model: "gpt-5.5",
wantOK: true,
wantOK: true,
},
{
name: "codex_platform_namespaced",
name: "codex_platform_namespaced",
runtime: "codex",
model: "openai/gpt-5.4-mini",
wantOK: true,
wantOK: true,
},
{
name: "hermes_kimi_coding",
name: "hermes_kimi_coding",
runtime: "hermes",
model: "kimi-coding/kimi-k2",
wantOK: true,
wantOK: true,
},
{
name: "hermes_platform_namespaced",
name: "hermes_platform_namespaced",
runtime: "hermes",
model: "moonshot/kimi-k2.6",
wantOK: true,
wantOK: true,
},
{
name: "openclaw_kimi_coding",
name: "openclaw_kimi_coding",
runtime: "openclaw",
model: "moonshot:kimi-k2.6",
wantOK: true,
wantOK: true,
},
// FAIL — model-side validator catches this, but the provider-side
// gate is called AFTER it in Create and inherits the fail-open
@@ -168,30 +212,30 @@ func TestValidateDerivedProviderInRegistry(t *testing.T) {
// errors → allow, letting the model-side response own the message).
// This is the deliberate "don't double-reject" decision.
{
name: "unregistered_model_pass_through_to_model_side",
name: "unregistered_model_pass_through_to_model_side",
runtime: "claude-code",
model: "totally-made-up-model-xyz",
wantOK: true, // pass-through: model-side validator owns the rejection
wantOK: true, // pass-through: model-side validator owns the rejection
},
// Federation contract — mirror of the model-side test above.
{
name: "langgraph_runtime_failopen",
name: "langgraph_runtime_failopen",
runtime: "langgraph",
model: "anything-goes",
wantOK: true,
wantOK: true,
},
{
name: "external_runtime_failopen",
name: "external_runtime_failopen",
runtime: "external",
model: "whatever",
wantOK: true,
wantOK: true,
},
// Empty model — MODEL_REQUIRED owns it; allow.
{
name: "empty_model_allowed_other_gate_owns_it",
name: "empty_model_allowed_other_gate_owns_it",
runtime: "claude-code",
model: "",
wantOK: true,
wantOK: true,
},
}
for _, c := range cases {
@@ -161,7 +161,7 @@ func (h *PluginsHandler) uninstallViaDocker(ctx context.Context, c *gin.Context,
// 1. Strip plugin's rule/fragment markers from CLAUDE.md (mirrors
// AgentskillsAdaptor.uninstall lines 184-188). Best-effort: if
// the user edited CLAUDE.md, our marker stays untouched.
h.stripPluginMarkersFromMemory(ctx, containerName, pluginName)
h.stripPluginMarkersFromMemory(ctx, workspaceID, containerName, pluginName)
// 2. Remove copied skill dirs declared in the plugin's plugin.yaml.
for _, skill := range skillNames {
@@ -171,9 +171,11 @@ func (h *PluginsHandler) uninstallViaDocker(ctx context.Context, c *gin.Context,
log.Printf("Plugin uninstall: skipping invalid skill name %q in %s: %v", skill, pluginName, err)
continue
}
_, _ = h.execAsRoot(ctx, containerName, []string{
if _, rmErr := h.execAsRoot(ctx, containerName, []string{
"rm", "-rf", "/configs/skills/" + skill,
})
}); rmErr != nil {
log.Printf("Plugin uninstall: failed to remove skill %s from %s: %v", skill, workspaceID, rmErr)
}
}
// 3. Delete the plugin directory itself (as root to handle file ownership).
@@ -393,7 +393,7 @@ func (h *PluginsHandler) readPluginSkillsFromContainer(ctx context.Context, cont
// `# Plugin: <name> /` — mirrors AgentskillsAdaptor.uninstall's stripping
// logic so install/uninstall are symmetric. Best-effort: silent on read or
// write failure, since the rest of uninstall must still succeed.
func (h *PluginsHandler) stripPluginMarkersFromMemory(ctx context.Context, containerName, pluginName string) {
func (h *PluginsHandler) stripPluginMarkersFromMemory(ctx context.Context, workspaceID, containerName, pluginName string) {
// Use sed via bash -c for atomic in-place delete: drop the marker line
// and the blank line that follows it (install adds a leading blank line
// before the marker via append_to_memory). Three sed passes mirror the
@@ -417,7 +417,9 @@ func (h *PluginsHandler) stripPluginMarkersFromMemory(ctx context.Context, conta
`awk 'BEGIN{skip=0; blanks=0} /^%s/{skip=1; blanks=0; next} skip==1 && /^[[:space:]]*$/{blanks++; if(blanks>=2){skip=0; print; next} next} /^# Plugin: /{if(skip==1)skip=0} skip==1{next} {print}' /configs/CLAUDE.md > /tmp/claude.new && mv /tmp/claude.new /configs/CLAUDE.md`,
regexpEscapeForAwk(marker),
)
_, _ = h.execAsRoot(ctx, containerName, []string{"bash", "-c", script})
if _, awkErr := h.execAsRoot(ctx, containerName, []string{"bash", "-c", script}); awkErr != nil {
log.Printf("Plugin uninstall: failed to strip markers from CLAUDE.md for %s in %s: %v", pluginName, workspaceID, awkErr)
}
}
// regexpEscapeForAwk escapes characters that have special meaning inside an
@@ -0,0 +1,331 @@
package handlers
import (
"context"
"database/sql"
"net/http"
"net/http/httptest"
"net/url"
"strings"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
"github.com/gin-gonic/gin"
"github.com/gorilla/websocket"
)
// rfbGreeting is the first frame a real websockify/RFB backend writes on
// connect. The fake backend below sends these exact bytes so the positive
// test can prove the upstream's first binary frame survives the reverse
// proxy chain (the "WS 1006" regression surface from core#2247 was the
// upgrade/handshake silently failing before any RFB byte reached the
// browser).
var rfbGreeting = []byte("RFB 003.008\n")
// newFakeWebsockifyBackend stands up an httptest.NewServer that upgrades the
// websocket, writes the RFB greeting as a binary frame, then echoes every
// frame it receives back to the client. No EC2, noVNC, or SSH involved — it
// is the stand-in for the on-instance :6080 websockify listener that
// realDisplayForward would normally tunnel to.
func newFakeWebsockifyBackend(t *testing.T) *httptest.Server {
t.Helper()
upgrader := websocket.Upgrader{
// The proxy rewrites Sec-WebSocket-Protocol to "binary"; accept any
// origin/subprotocol so the fake backend never rejects the handshake.
CheckOrigin: func(*http.Request) bool { return true },
Subprotocols: []string{"binary"},
HandshakeTimeout: 5 * time.Second,
EnableCompression: false,
}
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
return
}
defer conn.Close()
if err := conn.WriteMessage(websocket.BinaryMessage, rfbGreeting); err != nil {
return
}
for {
mt, msg, err := conn.ReadMessage()
if err != nil {
return
}
if err := conn.WriteMessage(mt, msg); err != nil {
return
}
}
}))
t.Cleanup(srv.Close)
return srv
}
// wireDisplayForwardToBackend overrides the injectable displayForward package
// var so DisplaySession proxies to the fake backend instead of opening an EIC
// SSH tunnel. Restored via t.Cleanup. The returned *url.URL is the http://
// backend address (the reverse proxy upgrades it to ws:// natively under
// Go 1.25's ReverseProxy WebSocket support).
func wireDisplayForwardToBackend(t *testing.T, backendURL string) {
t.Helper()
target, err := url.Parse(backendURL)
if err != nil {
t.Fatalf("parse backend URL %q: %v", backendURL, err)
}
prev := displayForward
displayForward = func(_ context.Context, _ string, fn func(target *url.URL) error) error {
return fn(target)
}
t.Cleanup(func() { displayForward = prev })
}
// newDisplaySessionTestServer mounts DisplaySession on a gin router behind an
// httptest.NewServer so a real websocket client can dial the route end-to-end.
// It returns the base ws:// URL for the websockify route.
func newDisplaySessionTestServer(t *testing.T, handler *WorkspaceHandler) *httptest.Server {
t.Helper()
r := gin.New()
// Mirror the production registration in internal/router/router.go:
// GET /workspaces/:id/display/session/*proxyPath -> wh.DisplaySession
r.GET("/workspaces/:id/display/session/*proxyPath", handler.DisplaySession)
srv := httptest.NewServer(r)
t.Cleanup(srv.Close)
return srv
}
const (
displayProxyWorkspaceID = "ws-display"
displayProxyInstanceID = "i-0fakedeadbeef00001"
displayProxyControlledBy = "admin-token"
)
// expectDisplaySessionTargetRow mocks loadWorkspaceDisplaySessionTarget's
// workspaces SELECT. mode "desktop-control" + a non-empty instance_id is the
// "display enabled, tunnel available" shape. (Note: the compute validator
// accepts modes none/desktop-control/gpu-desktop-control and protocols
// dcv/novnc — "novnc" is a *protocol*, not a mode, so the enabled rows use
// mode=desktop-control,protocol=novnc.)
func expectDisplaySessionTargetRow(mock sqlmock.Sqlmock, computeJSON, instanceID string) {
mock.ExpectQuery(`SELECT COALESCE\(compute, '\{\}'::jsonb\), COALESCE\(instance_id, ''\) FROM workspaces WHERE id = \$1`).
WithArgs(displayProxyWorkspaceID).
WillReturnRows(sqlmock.NewRows([]string{"compute", "instance_id"}).AddRow(computeJSON, instanceID))
}
// expectActiveDisplayControlRow mocks loadActiveDisplayControl's locks SELECT
// returning an active lock owned by controlledBy expiring at expiresAt.
func expectActiveDisplayControlRow(mock sqlmock.Sqlmock, controlledBy string, expiresAt time.Time) {
mock.ExpectQuery(`SELECT controller, controlled_by, expires_at FROM workspace_display_control_locks WHERE workspace_id = \$1 AND expires_at > now\(\)`).
WithArgs(displayProxyWorkspaceID).
WillReturnRows(sqlmock.NewRows([]string{"controller", "controlled_by", "expires_at"}).
AddRow("user", controlledBy, expiresAt))
}
const enabledComputeJSON = `{"display":{"mode":"desktop-control","protocol":"novnc","width":1280,"height":800}}`
// dialDisplaySession dials the websockify route on the given test server with
// the supplied Sec-WebSocket-Protocol values. It returns the conn (nil on
// failure), the HTTP response, and the dial error.
func dialDisplaySession(t *testing.T, srv *httptest.Server, subprotocols []string) (*websocket.Conn, *http.Response, error) {
t.Helper()
wsURL := "ws" + strings.TrimPrefix(srv.URL, "http") + "/workspaces/" + displayProxyWorkspaceID + "/display/session/websockify"
dialer := websocket.Dialer{
HandshakeTimeout: 5 * time.Second,
Subprotocols: subprotocols,
}
return dialer.Dial(wsURL, nil)
}
// TestDisplaySessionProxy_Positive proves the full take-control WS-proxy path
// without any network/EC2: a valid signed token + active lock + enabled
// display upgrades successfully (HTTP 101), the backend's RFB greeting arrives
// through the proxy, and a client->server byte round-trips back (bidirectional
// proxy chain). This is the direct regression guard for the "WS 1006" failure
// class in core#2247.
func TestDisplaySessionProxy_Positive(t *testing.T) {
t.Setenv("DISPLAY_SESSION_SIGNING_SECRET", "test-secret")
mock := setupTestDB(t)
backend := newFakeWebsockifyBackend(t)
wireDisplayForwardToBackend(t, backend.URL)
handler := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
srv := newDisplaySessionTestServer(t, handler)
expiresAt := time.Now().Add(5 * time.Minute)
expectDisplaySessionTargetRow(mock, enabledComputeJSON, displayProxyInstanceID)
expectActiveDisplayControlRow(mock, displayProxyControlledBy, expiresAt)
token := signDisplaySessionToken(displayProxyWorkspaceID, displayProxyControlledBy, expiresAt)
if token == "" {
t.Fatal("signDisplaySessionToken returned empty token")
}
conn, resp, err := dialDisplaySession(t, srv, []string{"binary", displaySessionTokenProtocolPrefix + token})
if err != nil {
body := ""
if resp != nil {
body = resp.Status
}
t.Fatalf("websocket dial failed: %v (resp=%s)", err, body)
}
t.Cleanup(func() { conn.Close() })
if resp.StatusCode != http.StatusSwitchingProtocols {
t.Fatalf("expected 101 Switching Protocols, got %d", resp.StatusCode)
}
// 1. The backend's RFB greeting must arrive through the proxy.
conn.SetReadDeadline(time.Now().Add(5 * time.Second))
mt, msg, err := conn.ReadMessage()
if err != nil {
t.Fatalf("read greeting through proxy failed: %v", err)
}
if mt != websocket.BinaryMessage || string(msg) != string(rfbGreeting) {
t.Fatalf("greeting = %q (type %d), want %q binary", msg, mt, rfbGreeting)
}
// 2. A client->server byte must echo back (bidirectional chain).
probe := []byte{0x13, 0x37, 0x00, 0xff}
if err := conn.WriteMessage(websocket.BinaryMessage, probe); err != nil {
t.Fatalf("write probe through proxy failed: %v", err)
}
conn.SetReadDeadline(time.Now().Add(5 * time.Second))
_, echo, err := conn.ReadMessage()
if err != nil {
t.Fatalf("read echo through proxy failed: %v", err)
}
if string(echo) != string(probe) {
t.Fatalf("echo = %q, want %q", echo, probe)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestDisplaySessionProxy_Rejections is table-driven over the failure surface.
// Each case asserts the WS upgrade does NOT happen (dial errors / no 101) and
// the right HTTP status is returned, WITHOUT ever reaching the fake backend.
func TestDisplaySessionProxy_Rejections(t *testing.T) {
t.Setenv("DISPLAY_SESSION_SIGNING_SECRET", "test-secret")
pastExpiry := time.Now().Add(-5 * time.Minute)
futureExpiry := time.Now().Add(5 * time.Minute)
cases := []struct {
name string
// expect wires the sqlmock rows that the handler will actually read
// for this case (the locks SELECT is only reached for token cases).
expect func(mock sqlmock.Sqlmock)
// subprotocols sent on the dial (token header, if any).
subprotocols []string
// proxyPath overrides the default "/websockify" route segment.
proxyPath string
wantStatus int
}{
{
name: "missing token -> 403",
expect: func(m sqlmock.Sqlmock) {
expectDisplaySessionTargetRow(m, enabledComputeJSON, displayProxyInstanceID)
expectActiveDisplayControlRow(m, displayProxyControlledBy, futureExpiry)
},
subprotocols: []string{"binary"},
wantStatus: http.StatusForbidden,
},
{
name: "tampered token -> 403",
expect: func(m sqlmock.Sqlmock) {
expectDisplaySessionTargetRow(m, enabledComputeJSON, displayProxyInstanceID)
expectActiveDisplayControlRow(m, displayProxyControlledBy, futureExpiry)
},
subprotocols: []string{"binary", displaySessionTokenProtocolPrefix + "garbage.not-a-valid-mac"},
wantStatus: http.StatusForbidden,
},
{
name: "expired lock -> 403",
expect: func(m sqlmock.Sqlmock) {
expectDisplaySessionTargetRow(m, enabledComputeJSON, displayProxyInstanceID)
// Active-lock query filters expires_at > now(), so an
// expired lock returns no rows -> found=false -> 403.
m.ExpectQuery(`SELECT controller, controlled_by, expires_at FROM workspace_display_control_locks WHERE workspace_id = \$1 AND expires_at > now\(\)`).
WithArgs(displayProxyWorkspaceID).
WillReturnError(sql.ErrNoRows)
},
// Token signed against the past expiry would also fail validation
// even if a stale lock row were returned.
subprotocols: []string{"binary", displaySessionTokenProtocolPrefix +
signDisplaySessionToken(displayProxyWorkspaceID, displayProxyControlledBy, pastExpiry)},
wantStatus: http.StatusForbidden,
},
{
name: "display mode none -> 404",
expect: func(m sqlmock.Sqlmock) {
expectDisplaySessionTargetRow(m, `{"display":{"mode":"none"}}`, displayProxyInstanceID)
},
subprotocols: []string{"binary"},
wantStatus: http.StatusNotFound,
},
{
name: "empty instance_id -> 503",
expect: func(m sqlmock.Sqlmock) {
expectDisplaySessionTargetRow(m, enabledComputeJSON, "")
},
subprotocols: []string{"binary"},
wantStatus: http.StatusServiceUnavailable,
},
{
name: "wrong proxyPath -> 404",
expect: func(m sqlmock.Sqlmock) {
expectDisplaySessionTargetRow(m, enabledComputeJSON, displayProxyInstanceID)
},
subprotocols: []string{"binary"},
proxyPath: "/frames",
wantStatus: http.StatusNotFound,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
mock := setupTestDB(t)
// A backend that fatals if it is ever reached — proves these
// rejections happen strictly before any proxy dial.
reached := false
backend := httptest.NewServer(http.HandlerFunc(func(http.ResponseWriter, *http.Request) {
reached = true
}))
t.Cleanup(backend.Close)
wireDisplayForwardToBackend(t, backend.URL)
handler := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
srv := newDisplaySessionTestServer(t, handler)
tc.expect(mock)
proxyPath := tc.proxyPath
if proxyPath == "" {
proxyPath = "/websockify"
}
wsURL := "ws" + strings.TrimPrefix(srv.URL, "http") +
"/workspaces/" + displayProxyWorkspaceID + "/display/session" + proxyPath
dialer := websocket.Dialer{HandshakeTimeout: 5 * time.Second, Subprotocols: tc.subprotocols}
conn, resp, err := dialer.Dial(wsURL, nil)
if conn != nil {
conn.Close()
}
if err == nil {
t.Fatalf("expected WS upgrade to fail, but dial succeeded")
}
if resp == nil {
t.Fatalf("expected an HTTP response on rejected upgrade, got nil (err=%v)", err)
}
if resp.StatusCode != tc.wantStatus {
t.Fatalf("status = %d, want %d", resp.StatusCode, tc.wantStatus)
}
if resp.StatusCode == http.StatusSwitchingProtocols {
t.Fatalf("upgrade unexpectedly succeeded (101)")
}
if reached {
t.Fatalf("rejection leaked to the upstream backend")
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
})
}
}
@@ -1,160 +0,0 @@
package models
// Contract test: the EXACT request bodies the workspace runtime emits for
// POST /registry/register and POST /registry/heartbeat bind cleanly against
// the real RegisterPayload / HeartbeatPayload structs — and a body missing a
// binding:"required" field is REJECTED.
//
// Why this exists — the same blind-spot class as the #2251 A2A bug
// ----------------------------------------------------------------
// The existing registry_test.go binds HAND-WRITTEN JSON literals
// (`{"id":"ws-123","agent_card":{...}}`) that encode the *test author's*
// idea of the wire shape, not the bytes the runtime actually produces. The
// runtime's producer (molecule-ai-workspace-runtime main.py:484 /
// heartbeat.py:233) is a separate hand-rolled dict. Nothing pinned that the
// two agree on the required keys.
//
// These golden bodies are byte-for-byte the shapes the runtime emits (see the
// companion Python contract test test_registry_payload_contract.py, which
// asserts the runtime PRODUCES exactly these required keys). Together the two
// halves form a producer→consumer contract: if the runtime drops a required
// key, the Python test fails; if this struct adds/renames a required field,
// the Go test below fails — drift can't pass silently on either side.
//
// gin's ShouldBindJSON runs `binding.JSON.BindBody`, which is json.Unmarshal
// followed by the go-playground validator on the `binding` tags. We invoke
// that exact path here without standing up a gin.Context / DB / Redis.
import (
"testing"
"github.com/gin-gonic/gin/binding"
)
// bindJSON mirrors gin's ShouldBindJSON: decode + validate the `binding` tags.
func bindJSON(t *testing.T, body []byte, out any) error {
t.Helper()
return binding.JSON.BindBody(body, out)
}
// ---- /registry/register --------------------------------------------------
// The exact body main.py emits (workspace_id + workspace_url + the hand-rolled
// agent_card_dict). agent_card is json.RawMessage on the struct so its inner
// shape is opaque to the bind — only presence is required.
const runtimeRegisterBody = `{
"id": "11111111-1111-1111-1111-111111111111",
"url": "https://ws.example/a2a",
"agent_card": {
"name": "pm",
"description": "team lead",
"version": "1.0.0",
"url": "https://ws.example/a2a",
"skills": [{"id": "coding", "name": "coding", "description": "coding", "tags": []}],
"capabilities": {"streaming": true, "pushNotifications": false},
"configuration_status": "ready"
}
}`
func TestRegisterPayload_RuntimeBodyBinds(t *testing.T) {
var p RegisterPayload
if err := bindJSON(t, []byte(runtimeRegisterBody), &p); err != nil {
t.Fatalf("runtime register body must bind against RegisterPayload, got: %v", err)
}
if p.ID != "11111111-1111-1111-1111-111111111111" {
t.Errorf("id not decoded: %q", p.ID)
}
if len(p.AgentCard) == 0 {
t.Error("agent_card must be present (binding:required)")
}
if p.URL == "" {
t.Error("url should round-trip from the runtime body")
}
}
func TestRegisterPayload_MissingID_Rejected(t *testing.T) {
// The #2251-style regression: runtime drops the required `id` key.
const noID = `{"url":"https://ws.example/a2a","agent_card":{"name":"pm"}}`
var p RegisterPayload
if err := bindJSON(t, []byte(noID), &p); err == nil {
t.Fatal("a register body missing the required `id` MUST be rejected (would 400); got nil error")
}
}
func TestRegisterPayload_MissingAgentCard_Rejected(t *testing.T) {
const noCard = `{"id":"ws-1","url":"https://ws.example/a2a"}`
var p RegisterPayload
if err := bindJSON(t, []byte(noCard), &p); err == nil {
t.Fatal("a register body missing the required `agent_card` MUST be rejected (would 400); got nil error")
}
}
// ---- /registry/heartbeat -------------------------------------------------
// The exact body heartbeat.py:233 emits (no wedge/metadata, the healthy case).
const runtimeHeartbeatBody = `{
"workspace_id": "00000000-0000-0000-0000-000000000688",
"error_rate": 0.0,
"sample_error": "",
"active_tasks": 0,
"current_task": "",
"uptime_seconds": 42
}`
func TestHeartbeatPayload_RuntimeBodyBinds(t *testing.T) {
var p HeartbeatPayload
if err := bindJSON(t, []byte(runtimeHeartbeatBody), &p); err != nil {
t.Fatalf("runtime heartbeat body must bind against HeartbeatPayload, got: %v", err)
}
if p.WorkspaceID != "00000000-0000-0000-0000-000000000688" {
t.Errorf("workspace_id not decoded: %q", p.WorkspaceID)
}
if p.UptimeSeconds != 42 {
t.Errorf("uptime_seconds not decoded: %d", p.UptimeSeconds)
}
}
// The wedged-runtime heartbeat (heartbeat.py _runtime_state_payload +
// _runtime_metadata_payload layered on) must also bind — runtime_metadata is a
// pointer so a present block decodes, and an absent one stays nil.
const runtimeHeartbeatWedgedBody = `{
"workspace_id": "00000000-0000-0000-0000-000000000688",
"error_rate": 0.5,
"active_tasks": 1,
"current_task": "stuck",
"uptime_seconds": 99,
"runtime_state": "wedged",
"sample_error": "Control request timeout: initialize",
"runtime_metadata": {
"capabilities": {"heartbeat": true, "scheduler": false},
"idle_timeout_seconds": 600
}
}`
func TestHeartbeatPayload_WedgedRuntimeBodyBinds(t *testing.T) {
var p HeartbeatPayload
if err := bindJSON(t, []byte(runtimeHeartbeatWedgedBody), &p); err != nil {
t.Fatalf("wedged heartbeat body must bind, got: %v", err)
}
if p.RuntimeState != "wedged" {
t.Errorf("runtime_state not decoded: %q", p.RuntimeState)
}
if p.RuntimeMetadata == nil {
t.Fatal("runtime_metadata must decode to a non-nil pointer when present")
}
if got := p.RuntimeMetadata.Capabilities["heartbeat"]; !got {
t.Error("runtime_metadata.capabilities[heartbeat] should be true")
}
if p.RuntimeMetadata.IdleTimeoutSeconds == nil || *p.RuntimeMetadata.IdleTimeoutSeconds != 600 {
t.Error("runtime_metadata.idle_timeout_seconds should decode to 600")
}
}
func TestHeartbeatPayload_MissingWorkspaceID_Rejected(t *testing.T) {
// The drift the producer-side Python test guards: workspace_id renamed/dropped.
const renamed = `{"id":"ws-688","error_rate":0.0,"active_tasks":0}`
var p HeartbeatPayload
if err := bindJSON(t, []byte(renamed), &p); err == nil {
t.Fatal("a heartbeat body missing the required `workspace_id` MUST be rejected (would 400); got nil error")
}
}
@@ -99,10 +99,16 @@ func TestDeriveProvider_UnregisteredErrors(t *testing.T) {
runtime string
model string
}{
// gpt-* is OpenAI — not in claude-code's native set.
// gpt-* is OpenAI — not in claude-code's native set (no openai arm;
// the platform-shared openai vendor is never wired into a BYOK runtime).
{"claude-code", "gpt-5.5"},
// deepseek is a catalog provider but in NO runtime's native set.
{"claude-code", "deepseek-v4-pro"},
// qwen-* is alibaba — a catalog provider NOT wired into claude-code
// (cp#529 wires alibaba only into hermes; claude-code's name-only BYOK
// arms are zai/deepseek/xiaomi-mimo). So it stays unregistered here.
// (NB: deepseek-* IS now routable on claude-code via the deepseek
// name-only arm — see the routability tests — so it is no longer a valid
// "unregistered" example; qwen replaces it.)
{"claude-code", "qwen-max"},
// codex is OpenAI-only — a kimi id is unregistered for it.
{"codex", "kimi-for-coding"},
// a slug no provider in the manifest matches at all.
@@ -16,7 +16,7 @@ const SchemaVersion = 1
// Fingerprint is a stable content hash of the generated projection (schema
// version + provider catalog + runtime native sets). It changes iff the
// registry DATA changes (comment-only YAML edits do not churn it).
const Fingerprint = "ae33546c8fba3474"
const Fingerprint = "ec6b93409e7b9cf8"
// GenProvider is the generated projection of one provider catalog entry —
// the subset a downstream consumer needs to derive + display a provider.
@@ -51,26 +51,31 @@ var Providers = []GenProvider{
{Name: "moonshot", DisplayName: "Moonshot (Kimi)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"MOONSHOT_API_KEY", "KIMI_API_KEY"}, ModelPrefixMatch: "^moonshot[:/-]", IsPlatform: false, UpstreamVendor: "moonshot"},
{Name: "minimax", DisplayName: "MiniMax", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"MINIMAX_API_KEY", "ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_API_KEY"}, ModelPrefixMatch: "(?i)^minimax-m", IsPlatform: false, UpstreamVendor: "minimax"},
{Name: "platform", DisplayName: "Platform", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"MOLECULE_LLM_USAGE_TOKEN"}, ModelPrefixMatch: "^platform/", IsPlatform: true},
{Name: "xiaomi-mimo", DisplayName: "Xiaomi MiMo", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_API_KEY"}, ModelPrefixMatch: "^mimo-", IsPlatform: false},
{Name: "zai", DisplayName: "Z.ai (GLM)", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"GLM_API_KEY", "ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_API_KEY"}, ModelPrefixMatch: "(?i)^glm-", IsPlatform: false},
{Name: "xiaomi-mimo", DisplayName: "Xiaomi MiMo", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_API_KEY"}, ModelPrefixMatch: "(?i)^(mimo-|xiaomi[:/])", IsPlatform: false},
{Name: "zai", DisplayName: "Z.ai (GLM)", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"GLM_API_KEY", "ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_API_KEY"}, ModelPrefixMatch: "(?i)^(glm-|zai[:/])", IsPlatform: false},
{Name: "kimi-coding", DisplayName: "Moonshot Kimi (coding-tuned)", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"KIMI_API_KEY", "ANTHROPIC_API_KEY", "ANTHROPIC_AUTH_TOKEN"}, ModelPrefixMatch: "^kimi-", IsPlatform: false},
{Name: "deepseek", DisplayName: "DeepSeek", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"DEEPSEEK_API_KEY", "ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_API_KEY"}, ModelPrefixMatch: "^deepseek-", IsPlatform: false},
{Name: "deepseek", DisplayName: "DeepSeek", Protocol: "anthropic", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"DEEPSEEK_API_KEY", "ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_API_KEY"}, ModelPrefixMatch: "^deepseek[-:/]", IsPlatform: false},
{Name: "google", DisplayName: "Google Gemini", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"GEMINI_API_KEY", "GOOGLE_API_KEY"}, ModelPrefixMatch: "^gemini-", IsPlatform: false},
{Name: "vertex", DisplayName: "Google Vertex AI (keyless ADC)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"GOOGLE_APPLICATION_CREDENTIALS"}, ModelPrefixMatch: "^vertex:", IsPlatform: false},
{Name: "alibaba", DisplayName: "Alibaba Qwen (DashScope)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"DASHSCOPE_API_KEY", "ALIBABA_API_KEY"}, ModelPrefixMatch: "^qwen-", IsPlatform: false},
{Name: "nousresearch", DisplayName: "Nous Research (Hermes)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"NOUSRESEARCH_API_KEY"}, ModelPrefixMatch: "^nousresearch/", IsPlatform: false},
{Name: "openrouter", DisplayName: "OpenRouter (any model)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OPENROUTER_API_KEY"}, ModelPrefixMatch: "^openrouter/", IsPlatform: false},
{Name: "huggingface", DisplayName: "Hugging Face Inference", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"HUGGINGFACE_API_KEY", "HF_TOKEN"}, ModelPrefixMatch: "^huggingface/", IsPlatform: false},
{Name: "ai-gateway", DisplayName: "Vercel AI Gateway", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"AI_GATEWAY_API_KEY"}, ModelPrefixMatch: "^ai-gateway/", IsPlatform: false},
{Name: "opencode-zen", DisplayName: "OpenCode Zen", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OPENCODE_ZEN_API_KEY"}, ModelPrefixMatch: "^opencode-zen/", IsPlatform: false},
{Name: "opencode-go", DisplayName: "OpenCode Go", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OPENCODE_GO_API_KEY"}, ModelPrefixMatch: "^opencode-go/", IsPlatform: false},
{Name: "kilocode", DisplayName: "Kilo Code", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"KILOCODE_API_KEY"}, ModelPrefixMatch: "^kilocode/", IsPlatform: false},
{Name: "minimax-cn", DisplayName: "MiniMax China", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"MINIMAX_API_KEY", "ANTHROPIC_AUTH_TOKEN"}, ModelPrefixMatch: "^minimax-cn/", IsPlatform: false},
{Name: "ollama-cloud", DisplayName: "Ollama Cloud", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OLLAMA_CLOUD_API_KEY"}, ModelPrefixMatch: "^ollama-cloud/", IsPlatform: false},
{Name: "alibaba", DisplayName: "Alibaba Qwen (DashScope)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"DASHSCOPE_API_KEY", "ALIBABA_API_KEY"}, ModelPrefixMatch: "(?i)^(qwen|alibaba[:/])", IsPlatform: false},
{Name: "nousresearch", DisplayName: "Nous Research (Hermes)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"NOUSRESEARCH_API_KEY"}, ModelPrefixMatch: "^nousresearch[:/]", IsPlatform: false},
{Name: "openrouter", DisplayName: "OpenRouter (any model)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OPENROUTER_API_KEY"}, ModelPrefixMatch: "^openrouter[:/]", IsPlatform: false},
{Name: "huggingface", DisplayName: "Hugging Face Inference", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"HUGGINGFACE_API_KEY", "HF_TOKEN"}, ModelPrefixMatch: "^huggingface[:/]", IsPlatform: false},
{Name: "ai-gateway", DisplayName: "Vercel AI Gateway", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"AI_GATEWAY_API_KEY"}, ModelPrefixMatch: "^ai-gateway[:/]", IsPlatform: false},
{Name: "opencode-zen", DisplayName: "OpenCode Zen", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OPENCODE_ZEN_API_KEY"}, ModelPrefixMatch: "^opencode-zen[:/]", IsPlatform: false},
{Name: "opencode-go", DisplayName: "OpenCode Go", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OPENCODE_GO_API_KEY"}, ModelPrefixMatch: "^opencode-go[:/]", IsPlatform: false},
{Name: "kilocode", DisplayName: "Kilo Code", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"KILOCODE_API_KEY"}, ModelPrefixMatch: "^kilocode[:/]", IsPlatform: false},
{Name: "minimax-cn", DisplayName: "MiniMax China", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"MINIMAX_API_KEY", "ANTHROPIC_AUTH_TOKEN"}, ModelPrefixMatch: "^minimax-cn[:/]", IsPlatform: false},
{Name: "ollama-cloud", DisplayName: "Ollama Cloud", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OLLAMA_CLOUD_API_KEY"}, ModelPrefixMatch: "^ollama-cloud[:/]", IsPlatform: false},
{Name: "ollama", DisplayName: "Ollama (self-hosted)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"OLLAMA_HOST"}, ModelPrefixMatch: "^ollama/", IsPlatform: false},
{Name: "nvidia", DisplayName: "NVIDIA NIM", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"NVIDIA_API_KEY"}, ModelPrefixMatch: "^nvidia/", IsPlatform: false},
{Name: "arcee", DisplayName: "Arcee", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"ARCEE_API_KEY"}, ModelPrefixMatch: "^arcee/", IsPlatform: false},
{Name: "custom", DisplayName: "Custom OpenAI-compat endpoint", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"CUSTOM_API_KEY", "OPENAI_API_KEY"}, ModelPrefixMatch: "^custom/", IsPlatform: false},
{Name: "nvidia", DisplayName: "NVIDIA NIM", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"NVIDIA_API_KEY"}, ModelPrefixMatch: "^nvidia[:/]", IsPlatform: false},
{Name: "arcee", DisplayName: "Arcee", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"ARCEE_API_KEY"}, ModelPrefixMatch: "^arcee[:/]", IsPlatform: false},
{Name: "custom", DisplayName: "Custom OpenAI-compat endpoint", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"CUSTOM_API_KEY", "OPENAI_API_KEY"}, ModelPrefixMatch: "^custom[:/]", IsPlatform: false},
{Name: "byok-anthropic", DisplayName: "Anthropic (BYOK)", Protocol: "anthropic", AuthMode: "anthropic_api", AuthEnv: []string{"ANTHROPIC_API_KEY"}, ModelPrefixMatch: "^anthropic/", IsPlatform: false},
{Name: "byok-openai", DisplayName: "OpenAI (BYOK)", Protocol: "openai", AuthMode: "anthropic_api", AuthEnv: []string{"OPENAI_API_KEY"}, ModelPrefixMatch: "^openai[:/]", IsPlatform: false},
{Name: "byok-gemini", DisplayName: "Google Gemini (BYOK)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"GEMINI_API_KEY", "GOOGLE_API_KEY"}, ModelPrefixMatch: "^gemini/", IsPlatform: false},
{Name: "byok-minimax", DisplayName: "MiniMax (BYOK)", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"MINIMAX_API_KEY"}, ModelPrefixMatch: "(?i)^(minimax[:/]|codex-minimax-)", IsPlatform: false},
{Name: "groq", DisplayName: "Groq", Protocol: "openai", AuthMode: "third_party_anthropic_compat", AuthEnv: []string{"GROQ_API_KEY"}, ModelPrefixMatch: "^groq:", IsPlatform: false},
}
// Runtimes maps each runtime to its native provider+model set, runtime names
@@ -82,11 +87,15 @@ var Runtimes = map[string][]GenRuntimeRef{
{Name: "kimi-coding", Models: []string{"kimi-for-coding", "kimi-k2.5", "kimi-k2", "moonshot:kimi-k2.6", "moonshot:kimi-k2.5"}},
{Name: "minimax", Models: []string{"MiniMax-M2", "MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M3", "minimax:MiniMax-M2", "minimax:MiniMax-M2.7", "minimax:MiniMax-M2.7-highspeed", "minimax:MiniMax-M3"}},
{Name: "platform", Models: []string{"anthropic/claude-opus-4-7", "anthropic/claude-sonnet-4-6", "moonshot/kimi-k2.6", "moonshot/kimi-k2.5", "minimax/MiniMax-M2.7", "minimax/MiniMax-M2.7-highspeed", "minimax/MiniMax-M3"}},
{Name: "zai", Models: []string{}},
{Name: "deepseek", Models: []string{}},
{Name: "xiaomi-mimo", Models: []string{}},
},
"codex": {
{Name: "openai-subscription", Models: []string{"gpt-5.5", "gpt-5.4", "gpt-5.4-mini", "gpt-5.3-codex", "gpt-5.3-codex-spark", "gpt-5.2"}},
{Name: "openai-api", Models: []string{"gpt-5.5", "gpt-5.4", "gpt-5.4-mini", "gpt-5.3-codex", "gpt-5.3-codex-spark", "gpt-5.2"}},
{Name: "platform", Models: []string{"openai/gpt-5.4", "openai/gpt-5.4-mini"}},
{Name: "byok-minimax", Models: []string{}},
},
"google-adk": {
{Name: "platform", Models: []string{"platform:gemini-2.5-pro", "platform:gemini-2.5-flash"}},
@@ -95,9 +104,34 @@ var Runtimes = map[string][]GenRuntimeRef{
"hermes": {
{Name: "kimi-coding", Models: []string{"kimi-coding/kimi-k2"}},
{Name: "platform", Models: []string{"moonshot/kimi-k2.6", "moonshot/kimi-k2.5"}},
{Name: "openrouter", Models: []string{}},
{Name: "huggingface", Models: []string{}},
{Name: "ai-gateway", Models: []string{}},
{Name: "opencode-zen", Models: []string{}},
{Name: "opencode-go", Models: []string{}},
{Name: "kilocode", Models: []string{}},
{Name: "custom", Models: []string{}},
{Name: "nvidia", Models: []string{}},
{Name: "arcee", Models: []string{}},
{Name: "ollama-cloud", Models: []string{}},
{Name: "minimax-cn", Models: []string{}},
{Name: "nousresearch", Models: []string{}},
{Name: "deepseek", Models: []string{}},
{Name: "zai", Models: []string{}},
{Name: "xiaomi-mimo", Models: []string{}},
{Name: "alibaba", Models: []string{}},
{Name: "byok-anthropic", Models: []string{}},
{Name: "byok-gemini", Models: []string{}},
{Name: "byok-openai", Models: []string{}},
{Name: "byok-minimax", Models: []string{}},
},
"openclaw": {
{Name: "kimi-coding", Models: []string{"moonshot:kimi-k2.6", "moonshot:kimi-k2.5"}},
{Name: "platform", Models: []string{"moonshot/kimi-k2.6", "moonshot/kimi-k2.5"}},
{Name: "openrouter", Models: []string{}},
{Name: "custom", Models: []string{}},
{Name: "byok-openai", Models: []string{}},
{Name: "byok-minimax", Models: []string{}},
{Name: "groq", Models: []string{}},
},
}
@@ -257,9 +257,20 @@ func parseManifest(raw []byte) (*Manifest, error) {
return nil, fmt.Errorf("providers: runtime %q references provider %q twice", rt, ref.Name)
}
refSeen[ref.Name] = struct{}{}
if len(ref.Models) == 0 {
return nil, fmt.Errorf("providers: runtime %q provider %q has no model ids", rt, ref.Name)
}
// A NAME-ONLY arm (zero model ids) is permitted (cp#529): it adds
// NOTHING to the runtime's platform menu (ModelsForRuntime only
// iterates ref.Models, so an empty Models contributes no selectable
// id — additive, zero platform-menu change) yet wires the provider
// into the runtime's NATIVE prefix-routing set, so a BYOK id the
// provider's model_prefix_match matches becomes routable via
// DeriveProvider step-4. This is the mechanism the cp#529
// routability-aware enforcer keys off: a name-only BYOK arm makes a
// passthrough id (openrouter/…, deepseek-…, etc.) resolve to a
// concrete provider without ever appearing on the platform menu.
// BILLING GUARDRAIL: only CONFIRMED-NON-PLATFORM (BYOK) providers
// are wired as name-only arms — never `platform`/anthropic-*/
// openai-*/moonshot/minimax/google/vertex — so a name-only arm can
// never route a customer model through the platform's key.
}
}
@@ -317,7 +317,7 @@ providers:
# Adapter prefix "mimo-"; canvas /^mimo-/i. proxy routing TBD (PR-3).
# NOTE: canvas has a duplicate "xiaomi" VENDOR_LABELS key aliasing the
# same vendor — collapsed into this one entry.
model_prefix_match: "^mimo-"
model_prefix_match: "(?i)^(mimo-|xiaomi[:/])"
model_aliases: []
# ===========================================================================
@@ -334,7 +334,7 @@ providers:
auth_token_env: ANTHROPIC_AUTH_TOKEN
# Adapter prefix "glm-" (lowercased match catches GLM-4.6); canvas /^GLM-/i.
# canvas-only + adapter-only today; proxy routing TBD (PR-3).
model_prefix_match: "(?i)^glm-"
model_prefix_match: "(?i)^(glm-|zai[:/])"
model_aliases: []
# ===========================================================================
@@ -385,7 +385,7 @@ providers:
auth_token_env: ANTHROPIC_AUTH_TOKEN
# Adapter prefix "deepseek-"; canvas /^deepseek-/i. adapter+canvas only;
# proxy routing TBD (PR-3).
model_prefix_match: "^deepseek-"
model_prefix_match: "^deepseek[-:/]"
model_aliases: []
# ===========================================================================
@@ -452,7 +452,7 @@ providers:
auth_env: [DASHSCOPE_API_KEY, ALIBABA_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD. canvas /^qwen-/i.
model_prefix_match: "^qwen-"
model_prefix_match: "(?i)^(qwen|alibaba[:/])"
model_aliases: []
- name: nousresearch
@@ -466,7 +466,7 @@ providers:
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD. Slash-prefix id
# (e.g. nousresearch/hermes-4-70b).
model_prefix_match: "^nousresearch/"
model_prefix_match: "^nousresearch[:/]"
model_aliases: []
- name: openrouter
@@ -479,7 +479,7 @@ providers:
auth_env: [OPENROUTER_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD. Wildcard: openrouter/<model>.
model_prefix_match: "^openrouter/"
model_prefix_match: "^openrouter[:/]"
model_aliases: []
- name: huggingface
@@ -492,7 +492,7 @@ providers:
auth_env: [HUGGINGFACE_API_KEY, HF_TOKEN]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD. Wildcard: huggingface/<model>.
model_prefix_match: "^huggingface/"
model_prefix_match: "^huggingface[:/]"
model_aliases: []
- name: ai-gateway
@@ -505,7 +505,7 @@ providers:
auth_env: [AI_GATEWAY_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD.
model_prefix_match: "^ai-gateway/"
model_prefix_match: "^ai-gateway[:/]"
model_aliases: []
- name: opencode-zen
@@ -518,7 +518,7 @@ providers:
auth_env: [OPENCODE_ZEN_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD.
model_prefix_match: "^opencode-zen/"
model_prefix_match: "^opencode-zen[:/]"
model_aliases: []
- name: opencode-go
@@ -531,7 +531,7 @@ providers:
auth_env: [OPENCODE_GO_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD.
model_prefix_match: "^opencode-go/"
model_prefix_match: "^opencode-go[:/]"
model_aliases: []
- name: kilocode
@@ -544,7 +544,7 @@ providers:
auth_env: [KILOCODE_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD.
model_prefix_match: "^kilocode/"
model_prefix_match: "^kilocode[:/]"
model_aliases: []
- name: minimax-cn
@@ -559,7 +559,7 @@ providers:
# canvas-only today; proxy routing TBD. China endpoint sibling of `minimax`
# (api.minimaxi.com). Matched only by the explicit slash-prefix so it does
# NOT collide with `minimax`'s (?i)^minimax- in the overlap guard.
model_prefix_match: "^minimax-cn/"
model_prefix_match: "^minimax-cn[:/]"
model_aliases: []
- name: ollama-cloud
@@ -572,7 +572,7 @@ providers:
auth_env: [OLLAMA_CLOUD_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD.
model_prefix_match: "^ollama-cloud/"
model_prefix_match: "^ollama-cloud[:/]"
model_aliases: []
- name: ollama
@@ -598,7 +598,7 @@ providers:
auth_env: [NVIDIA_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD.
model_prefix_match: "^nvidia/"
model_prefix_match: "^nvidia[:/]"
model_aliases: []
- name: arcee
@@ -611,7 +611,7 @@ providers:
auth_env: [ARCEE_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD.
model_prefix_match: "^arcee/"
model_prefix_match: "^arcee[:/]"
model_aliases: []
- name: custom
@@ -624,7 +624,109 @@ providers:
auth_env: [CUSTOM_API_KEY, OPENAI_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# canvas-only today; proxy routing TBD. Wildcard free-text: custom/<model>.
model_prefix_match: "^custom/"
model_prefix_match: "^custom[:/]"
model_aliases: []
# ===========================================================================
# DEDICATED BYOK-VENDOR providers (cp#529). These exist so the NAMESPACED
# BYOK ids the hermes/openclaw/codex templates offer for the SHARED upstream
# vendors (anthropic, openai, gemini, minimax, groq) become routable with the
# TENANT's OWN vendor key — WITHOUT routing them through the platform-shared
# `platform` provider (which would bill the platform's key: a money bug).
#
# Each is NON-PLATFORM (name != "platform") -> IsPlatform()==false -> BYOK
# billing: the workspace env supplies the vendor key, never the platform key.
#
# COLLISION-FREE BY CONSTRUCTION: every matcher is NAMESPACED (anchored on the
# `vendor/` slash form or `vendor:` colon form) so it is DISJOINT from the
# platform vendors' BARE matchers (anthropic-api `^claude`, openai-subscription
# `^gpt-`, openai-api `^openai-api[:/]`, minimax `(?i)^minimax-m`,
# google `^gemini-`, minimax-cn `^minimax-cn[:/]`). DeriveProvider's overlap
# guard (no slug may match two native providers) stays green — verified for all
# 20 residual ids (cp#529).
#
# These siblings of the platform/upstream vendor entries point at the SAME
# PUBLIC upstream base URLs, but carry NO upstream_vendor (they are BYOK
# passthroughs, not proxy upstream targets — the proxy never dials a tenant's
# own key) and use the namespaced matchers above instead of the bare proxy
# prefixes.
# ===========================================================================
- name: byok-anthropic
display_name: "Anthropic (BYOK)"
vendor_logo: "anthropic"
protocol: anthropic
auth_mode: anthropic_api
base_url_template: "https://api.anthropic.com/v1"
base_url_anthropic: "https://api.anthropic.com/v1"
auth_env: [ANTHROPIC_API_KEY]
auth_token_env: ANTHROPIC_API_KEY
# Namespaced BYOK form `anthropic/<model>` (hermes). DISJOINT from
# anthropic-api's bare `^claude` and anthropic-oauth's alias set.
model_prefix_match: "^anthropic/"
model_aliases: []
- name: byok-openai
display_name: "OpenAI (BYOK)"
vendor_logo: "openai"
protocol: openai
auth_mode: anthropic_api # openai-protocol; auth is a bearer API key.
base_url_template: "https://api.openai.com/v1"
base_url_anthropic: null
auth_env: [OPENAI_API_KEY]
auth_token_env: OPENAI_API_KEY
# Namespaced BYOK forms `openai/<model>` (hermes) + `openai:<model>`
# (openclaw). DISJOINT from openai-subscription's bare `^gpt-` and
# openai-api's `^openai-api[:/]` (the dash after `openai` keeps the two
# apart: `openai:` / `openai/` never start with `openai-api`).
model_prefix_match: "^openai[:/]"
model_aliases: []
- name: byok-gemini
display_name: "Google Gemini (BYOK)"
vendor_logo: "google"
protocol: openai
auth_mode: third_party_anthropic_compat
base_url_template: "https://generativelanguage.googleapis.com/v1beta/openai"
base_url_anthropic: null
auth_env: [GEMINI_API_KEY, GOOGLE_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# Namespaced BYOK form `gemini/<model>` (hermes). DISJOINT from the `google`
# vendor's bare `^gemini-` and `vertex`'s `^vertex:`.
model_prefix_match: "^gemini/"
model_aliases: []
- name: byok-minimax
display_name: "MiniMax (BYOK)"
vendor_logo: "minimax"
protocol: openai
auth_mode: third_party_anthropic_compat
base_url_template: "https://api.minimax.io/v1"
base_url_anthropic: null
auth_env: [MINIMAX_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# Namespaced BYOK forms `minimax:<model>` (openclaw) + `minimax/<model>`
# (hermes), PLUS the codex-runtime alias `codex-minimax-m2.7` (the codex
# template's `minimax-token-plan` route — same upstream api.minimax.io,
# tenant MINIMAX_API_KEY). The `codex-minimax-` leg is NARROWLY anchored so
# it resolves that one codex id WITHOUT a broad matcher: it is DISJOINT from
# `minimax` (?i)^minimax-m (which needs `minimax-m`, not `codex-`) and from
# `minimax-cn` ^minimax-cn[:/]. Verified collision-free for all 20 residual
# ids + codex-minimax-m2.7 (cp#529).
model_prefix_match: "(?i)^(minimax[:/]|codex-minimax-)"
model_aliases: []
- name: groq
display_name: "Groq"
vendor_logo: "groq"
protocol: openai
auth_mode: third_party_anthropic_compat
base_url_template: "https://api.groq.com/openai/v1"
base_url_anthropic: null
auth_env: [GROQ_API_KEY]
auth_token_env: ANTHROPIC_AUTH_TOKEN
# Namespaced BYOK form `groq:<model>` (openclaw). No other provider matches
# the `groq:` prefix.
model_prefix_match: "^groq:"
model_aliases: []
# =============================================================================
@@ -762,6 +864,16 @@ runtimes:
- minimax/MiniMax-M2.7
- minimax/MiniMax-M2.7-highspeed
- minimax/MiniMax-M3
# NAME-ONLY BYOK arms (cp#529): zero model ids → add NOTHING to the
# platform menu (ModelsForRuntime), but wire these CONFIRMED-NON-PLATFORM
# providers into claude-code's NATIVE prefix-routing set so the bare BYOK
# ids the claude-code template offers (GLM-*, deepseek-*, mimo-*) resolve
# via DeriveProvider. BILLING-SAFE: zai/deepseek/xiaomi-mimo are tenant-key
# (BYOK) providers — never platform-shared — so routing through them bills
# the tenant's own key, never the platform's.
- name: zai
- name: deepseek
- name: xiaomi-mimo
# hermes: native Kimi only (kimi-coding gateway). hermes-agent owns its own
# broad provider matrix, but the CTO native matrix for the Molecule
@@ -777,6 +889,38 @@ runtimes:
models:
- moonshot/kimi-k2.6
- moonshot/kimi-k2.5
# NAME-ONLY BYOK arms (cp#529): zero model ids → no addition to the
# platform menu, but wire hermes's CONFIRMED-NON-PLATFORM passthrough +
# bare-vendor providers into its NATIVE prefix-routing set so the BYOK
# ids the hermes template offers (openrouter/…, huggingface/…, deepseek/…,
# zai:…, etc.) resolve via DeriveProvider. ALL tenant-key (BYOK).
- name: openrouter
- name: huggingface
- name: ai-gateway
- name: opencode-zen
- name: opencode-go
- name: kilocode
- name: custom
- name: nvidia
- name: arcee
- name: ollama-cloud
- name: minimax-cn
- name: nousresearch
- name: deepseek
- name: zai
- name: xiaomi-mimo
- name: alibaba
# DEDICATED BYOK-VENDOR arms (cp#529): the namespaced ids hermes offers for
# the SHARED upstream vendors (anthropic/claude-*, gemini/*, openai/*,
# minimax/*) NOW resolve to these tenant-key BYOK-vendor providers — NOT
# the platform-shared `platform` provider (which would bill the platform's
# key). NAME-ONLY (no models) → no platform-menu change, prefix-routing
# only, BYOK-billed. This converts the last 12 hermes residual ids from
# cp#529 drift to routable.
- name: byok-anthropic
- name: byok-gemini
- name: byok-openai
- name: byok-minimax
# codex: OpenAI — BYOK split across TWO native providers
# (openai-subscription + openai-api), mirroring claude-code's anthropic
@@ -828,6 +972,14 @@ runtimes:
models:
- openai/gpt-5.4
- openai/gpt-5.4-mini
# NAME-ONLY BYOK arm (cp#529): the codex template offers a BYOK MiniMax
# token-plan model `codex-minimax-m2.7` (its `minimax-token-plan` provider:
# base_url api.minimax.io, tenant MINIMAX_API_KEY, model_id_override
# codex-MiniMax-M2.7). It resolves to byok-minimax via the narrowly-anchored
# `codex-minimax-` leg of byok-minimax's matcher (same upstream, tenant key)
# — NOT a broad matcher. NAME-ONLY → no platform-menu change, BYOK-billed.
# Converts the last codex residual id from cp#529 drift to routable.
- name: byok-minimax
# openclaw: native Kimi only. openclaw's moonshot: model prefix + a
# KIMI_API_KEY (sk-kimi-*) routes to api.kimi.com/coding (kimi-for-coding),
@@ -846,6 +998,21 @@ runtimes:
models:
- moonshot/kimi-k2.6
- moonshot/kimi-k2.5
# NAME-ONLY BYOK arms (cp#529): zero model ids → no platform-menu change,
# but wire openclaw's CONFIRMED-NON-PLATFORM passthroughs into its NATIVE
# prefix-routing set so the BYOK colon/slash ids the openclaw template
# offers (openrouter:…, custom:…) resolve via DeriveProvider. BYOK only.
- name: openrouter
- name: custom
# DEDICATED BYOK-VENDOR arms (cp#529): openclaw's default model is
# `minimax:MiniMax-M2.7`, plus it offers `openai:*` and `groq:*` BYOK ids.
# These NOW resolve to the tenant-key BYOK-vendor providers (NOT the
# platform key). NAME-ONLY → prefix-routing only, BYOK-billed. This converts
# the last 7 openclaw residual ids from cp#529 drift to routable AND makes
# the runtime's DEFAULT model (minimax:MiniMax-M2.7) resolve.
- name: byok-openai
- name: byok-minimax
- name: groq
# google-adk: Gemini via Vertex AI, keyless ADC (Workload Identity
@@ -17,19 +17,39 @@ import (
// of its native vendors the proxy can serve — kimi for hermes/openclaw,
// openai for codex, anthropic+kimi+minimax for claude-code.
//
// cp#529 adds NAME-ONLY BYOK arms (zero model ids) to claude-code/hermes/
// openclaw: they add NOTHING to the platform menu (ModelsForRuntime) but wire
// CONFIRMED-NON-PLATFORM providers into the runtime's NATIVE prefix-routing set
// so a matching BYOK id resolves via DeriveProvider. ProvidersForRuntime returns
// the full native arm set (menu + name-only), so the expected sets below include
// them. The platform-shared/denylist providers are NEVER wired into a BYOK arm.
//
// claude-code -> anthropic (oauth+api), kimi (kimi-coding), minimax, platform
// + BYOK name-only: zai, deepseek, xiaomi-mimo
// hermes -> kimi (kimi-coding), platform
// codex -> openai (subscription + api), platform
// openclaw -> kimi (kimi-coding), platform
// + BYOK name-only: openrouter, huggingface, ai-gateway,
// opencode-zen, opencode-go, kilocode, custom, nvidia, arcee,
// ollama-cloud, minimax-cn, nousresearch, deepseek, zai,
// xiaomi-mimo, alibaba
// codex -> openai (subscription + api), platform (no BYOK name-only)
// openclaw -> kimi (kimi-coding), platform + BYOK name-only: openrouter, custom
var runtimeNativeProviders = map[string][]string{
"claude-code": {"anthropic-api", "anthropic-oauth", "kimi-coding", "minimax", "platform"},
"hermes": {"kimi-coding", "platform"},
"claude-code": {"anthropic-api", "anthropic-oauth", "kimi-coding", "minimax", "platform", "zai", "deepseek", "xiaomi-mimo"},
"hermes": {"kimi-coding", "platform",
"openrouter", "huggingface", "ai-gateway", "opencode-zen", "opencode-go",
"kilocode", "custom", "nvidia", "arcee", "ollama-cloud", "minimax-cn",
"nousresearch", "deepseek", "zai", "xiaomi-mimo", "alibaba",
// cp#529 dedicated BYOK-vendor name-only arms (shared-vendor namespaced ids).
"byok-anthropic", "byok-gemini", "byok-openai", "byok-minimax"},
// codex's OpenAI BYOK is split across the OAuth subscription arm
// (openai-subscription) and the direct-key arm (openai-api), mirroring
// claude-code's anthropic oauth+api split; platform openai via the proxy
// Responses surface.
"codex": {"openai-subscription", "openai-api", "platform"},
"openclaw": {"kimi-coding", "platform"},
// Responses surface. cp#529 adds the byok-minimax name-only arm so the
// template's BYOK MiniMax token-plan id (codex-minimax-m2.7) resolves.
"codex": {"openai-subscription", "openai-api", "platform", "byok-minimax"},
"openclaw": {"kimi-coding", "platform", "openrouter", "custom",
// cp#529 dedicated BYOK-vendor name-only arms (openai:/minimax:/groq:).
"byok-openai", "byok-minimax", "groq"},
}
func sortedCopy(in []string) []string {
@@ -253,6 +273,56 @@ func TestParseManifest_ValidBaseline(t *testing.T) {
}
}
// TestParseManifest_NameOnlyArm proves a NAME-ONLY runtime arm (zero model
// ids) is PERMITTED (cp#529) and is additive: it contributes nothing to the
// runtime's platform menu (ModelsForRuntime) yet wires the provider into the
// runtime's NATIVE prefix-routing set so a matching BYOK id resolves via
// DeriveProvider. This is the loader half of the cp#529 routability change.
func TestParseManifest_NameOnlyArm(t *testing.T) {
const y = `
schema_version: 1
providers:
- name: openai
display_name: "OpenAI"
protocol: openai
auth_mode: anthropic_api
auth_env: [OPENAI_API_KEY]
model_prefix_match: "^gpt-"
- name: openrouter
display_name: "OpenRouter"
protocol: openai
auth_mode: third_party_anthropic_compat
auth_env: [OPENROUTER_API_KEY]
model_prefix_match: "^openrouter[:/]"
runtimes:
codex:
providers:
- name: openai
models: [gpt-5.5]
- name: openrouter
`
m, err := parseManifest([]byte(y))
if err != nil {
t.Fatalf("parseManifest(name-only arm) error = %v; want nil (name-only arms are permitted)", err)
}
// The name-only arm adds NOTHING to the platform menu.
models, err := m.ModelsForRuntime("codex")
if err != nil {
t.Fatalf("ModelsForRuntime(codex) error = %v", err)
}
if len(models) != 1 || models[0] != "gpt-5.5" {
t.Fatalf("ModelsForRuntime(codex) = %v; want [gpt-5.5] (name-only arm must not add a menu id)", models)
}
// …yet a BYOK id matching the name-only arm's prefix now ROUTES.
p, err := m.DeriveProvider("codex", "openrouter/anthropic/claude-3.5-sonnet", nil)
if err != nil {
t.Fatalf("DeriveProvider(codex, openrouter/…) error = %v; want it to resolve via the name-only arm", err)
}
if p.Name != "openrouter" {
t.Fatalf("DeriveProvider resolved to %q; want openrouter", p.Name)
}
}
// TestParseManifest_FailDirection is the load-bearing-guard proof: each case
// breaks the manifest in one way and asserts the matching error fires. If a
// future edit removes a guard, the corresponding case flips red.
@@ -287,19 +357,6 @@ runtimes:
`,
wantErr: "empty native provider set",
},
{
name: "provider ref with no models",
yaml: `
schema_version: 1
providers:
- {name: openai, display_name: "OpenAI", protocol: openai, auth_mode: anthropic_api, auth_env: [OPENAI_API_KEY], model_prefix_match: "^gpt-"}
runtimes:
codex:
providers:
- {name: openai, models: []}
`,
wantErr: "no model ids",
},
{
name: "duplicate provider ref",
yaml: `
@@ -29,7 +29,7 @@ import (
// canonicalProvidersYAMLSHA256 is the sha256 of the canonical providers.yaml as
// synced from molecule-controlplane. Bumped deliberately on each re-sync (see
// file doc). Cross-checked live by the sync-providers-yaml CI workflow.
const canonicalProvidersYAMLSHA256 = "8e19aaf8a2a37cdd109184ae80ca223ce0a0ce0ed30299a52aa990271da5af7a"
const canonicalProvidersYAMLSHA256 = "846ddef11ec423ebf2e96b5da21bd89129dbc3f0a2d14ac086940e005c079387"
func TestSyncedYAMLMatchesCanonicalSHA(t *testing.T) {
sum := sha256.Sum256(embeddedYAML)
@@ -0,0 +1,178 @@
package registry
// cp_instance_reconciler.go — authoritative EC2-state reconcile for
// SaaS workspaces (core#2261).
//
// Root cause (core#2247): every existing liveness pass keys off a PROXY
// for "is this workspace alive?":
//
// - StartLivenessMonitor — Redis TTL expiry (agent stopped heartbeating).
// - StartHealthSweep (Docker pass) — local Docker daemon (prov != nil only).
// - StartHealthSweep (remote pass) — last_heartbeat_at freshness for
// runtime='external' rows.
// - StartCPOrphanSweeper — status='removed' rows with a stray instance_id.
//
// A SaaS claude-code workspace whose EC2 was terminated/stopped out from
// under us (manual AWS action, spot reclaim, CP-side reap, etc.) falls
// through ALL of them: it's not 'removed' (so the orphan sweeper skips
// it), it's not runtime='external' (so the heartbeat pass skips it), and
// on a pure-SaaS front-door prov == nil so the Docker pass never runs.
// The registry kept status='online' pointing at a dead instance forever.
//
// This sweeper closes that gap with the ONE authoritative check the
// others lack: CPProvisioner.IsRunning, which ultimately asks the
// control-plane "is this EC2 actually running?" (DescribeInstances-
// equivalent). When the answer is a CLEAN "no" it feeds the workspace
// into the EXISTING offline/auto-heal machinery (onOffline → status flip
// + RestartByID reprovision with the existing volume) — no new healing
// path, just real ground truth driving the one we already have.
//
// Guardrails:
// - FAIL-SAFE: IsRunning is (true, err) on any transient DB/transport
// error and (false, nil) ONLY when CP genuinely reports the instance
// is not running. We act ONLY on (false, nil); any err short-circuits
// to "leave it alone" so a CP blip never flips a healthy workspace.
// - ONLINE + SaaS ONLY: status='online', instance_id present, and
// runtime <> 'external'. Paused/hibernated/removed/provisioning/
// awaiting_agent rows are out of scope; external rows are covered by
// the remote-heartbeat pass.
// - Per-cycle row cap + per-workspace timeout so one slow CP call can't
// stall the sweep.
import (
"context"
"log"
"time"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db"
)
// InstanceRunningChecker is the narrow dependency the reconciler takes
// from the CP provisioner. *provisioner.CPProvisioner satisfies this
// naturally; tests inject fakes.
//
// Contract (load-bearing): IsRunning is FAIL-SAFE — it returns
// (true, err) on transient DB/transport errors and (false, nil) ONLY
// when CP reports the instance is genuinely not running. The reconciler
// flips a workspace offline strictly on (false, nil).
type InstanceRunningChecker interface {
IsRunning(ctx context.Context, workspaceID string) (bool, error)
}
// CPInstanceReconcileLimit caps the per-cycle row count so a sustained
// CP slowdown can't make a single sweep cycle run unbounded. With a 60s
// cadence and a per-workspace timeout below, this bounds worst-case
// cycle wall-time and lets subsequent cycles drain any backlog.
const CPInstanceReconcileLimit = 200
// cpInstanceCheckTimeout bounds a single IsRunning call so one slow CP
// round-trip can't stall the whole sweep. Each workspace gets its own
// timeout context derived from the cycle context.
const cpInstanceCheckTimeout = 10 * time.Second
// StartCPInstanceReconciler runs the authoritative EC2-state reconcile
// loop until ctx is cancelled. A nil checker makes the loop a no-op
// (matches the nil-tolerant pattern of the sibling CP sweeper).
//
// Caller is expected to gate on `cpProv != nil` (matching how
// StartCPOrphanSweeper is gated at the wiring site in cmd/server/main.go)
// — passing a nil *CPProvisioner here would also short-circuit, but the
// gate at the call site keeps the call shape symmetric across sweepers.
//
// interval <= 0 falls back to the default 60s cadence so a misconfigured
// caller can't spin a zero-duration ticker (which panics).
func StartCPInstanceReconciler(ctx context.Context, checker InstanceRunningChecker, onOffline OfflineHandler, interval time.Duration) {
if checker == nil {
log.Println("cp-instance-reconciler: checker is nil — reconciler disabled")
return
}
if interval <= 0 {
interval = 60 * time.Second
}
log.Printf("cp-instance-reconciler started — reconciling online SaaS workspaces against real EC2 state every %s", interval)
ticker := time.NewTicker(interval)
defer ticker.Stop()
// Kick once at boot so a platform restart starts healing immediately
// rather than waiting a full interval.
reconcileOnce(ctx, checker, onOffline)
for {
select {
case <-ctx.Done():
log.Println("cp-instance-reconciler: shutdown")
return
case <-ticker.C:
reconcileOnce(ctx, checker, onOffline)
}
}
}
// reconcileOnce executes one reconcile pass. Defensive against db.DB
// being nil so a misconfigured boot doesn't panic.
//
// Scope: online + SaaS-EC2 workspaces only. runtime='external' rows are
// excluded (covered by the remote-heartbeat pass); paused/hibernated/
// removed/provisioning/awaiting_agent are excluded by the status filter.
func reconcileOnce(ctx context.Context, checker InstanceRunningChecker, onOffline OfflineHandler) {
if db.DB == nil {
return
}
rows, err := db.DB.QueryContext(ctx, `
SELECT id::text
FROM workspaces
WHERE status = 'online'
AND instance_id IS NOT NULL
AND instance_id != ''
AND COALESCE(runtime, '') <> 'external'
ORDER BY updated_at DESC
LIMIT $1
`, CPInstanceReconcileLimit)
if err != nil {
log.Printf("cp-instance-reconciler: DB query failed: %v", err)
return
}
defer rows.Close()
var ids []string
for rows.Next() {
var id string
if scanErr := rows.Scan(&id); scanErr != nil {
log.Printf("cp-instance-reconciler: row scan failed: %v", scanErr)
continue
}
ids = append(ids, id)
}
if iterErr := rows.Err(); iterErr != nil {
log.Printf("cp-instance-reconciler: rows iteration failed: %v", iterErr)
return
}
for _, id := range ids {
// Per-workspace timeout so one slow CP round-trip can't stall
// the whole sweep.
checkCtx, cancel := context.WithTimeout(ctx, cpInstanceCheckTimeout)
running, checkErr := checker.IsRunning(checkCtx, id)
cancel()
if checkErr != nil {
// FAIL-SAFE: transient DB/transport error (or a no-backend
// signal). IsRunning returns (true, err) on these, so never
// flip — leave the row online and retry next cycle.
log.Printf("cp-instance-reconciler: IsRunning(%s) errored, leaving online (fail-safe): %v", id, checkErr)
continue
}
if running {
continue
}
// CLEAN "not running" — CP authoritatively reports the EC2 is
// terminated/stopped/absent. Feed it into the existing offline +
// auto-heal machinery: onOffline flips the row offline and
// triggers RestartByID, which reprovisions with the existing
// volume.
log.Printf("cp-instance-reconciler: workspace %s is status=online but its EC2 is not running (terminated/stopped) — flipping offline + triggering reprovision", id)
if onOffline != nil {
onOffline(ctx, id)
}
}
}
@@ -0,0 +1,282 @@
package registry
import (
"context"
"errors"
"sync"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
"git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db"
)
// fakeRunningChecker implements InstanceRunningChecker for the
// instance-reconciler tests. Records every IsRunning call so tests can
// assert which workspace IDs were probed, and returns a per-id
// (running, err) pair so we can model CP's three answers:
//
// (true, nil) — instance is running.
// (false, nil) — CLEAN "not running" (terminated/stopped/absent).
// (true, err) — transient DB/transport error (FAIL-SAFE path).
type fakeRunningChecker struct {
mu sync.Mutex
running map[string]bool
errs map[string]error
calls []string
}
func (f *fakeRunningChecker) IsRunning(_ context.Context, wsID string) (bool, error) {
f.mu.Lock()
defer f.mu.Unlock()
f.calls = append(f.calls, wsID)
if err, ok := f.errs[wsID]; ok {
// Mirror CPProvisioner.IsRunning: (true, err) on transient errors
// so callers stay on the alive path.
return true, err
}
return f.running[wsID], nil
}
// recordingOffline is an OfflineHandler that records the workspace IDs
// it was invoked with.
type recordingOffline struct {
mu sync.Mutex
calls []string
}
func (r *recordingOffline) handler() OfflineHandler {
return func(_ context.Context, wsID string) {
r.mu.Lock()
defer r.mu.Unlock()
r.calls = append(r.calls, wsID)
}
}
func (r *recordingOffline) got() []string {
r.mu.Lock()
defer r.mu.Unlock()
out := make([]string, len(r.calls))
copy(out, r.calls)
return out
}
// expectReconcileQuery registers the reconciler's SELECT, pinning the
// scope-critical predicates: status='online', instance_id present, and
// runtime <> 'external'. A future widening that drops any of these (e.g.
// sweeping paused rows, or external rows the heartbeat pass owns) fails
// every test that uses this helper.
func expectReconcileQuery(mock sqlmock.Sqlmock, rows *sqlmock.Rows) {
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces\s+WHERE status = 'online'\s+AND instance_id IS NOT NULL\s+AND instance_id != ''\s+AND COALESCE\(runtime, ''\) <> 'external'\s+ORDER BY updated_at DESC\s+LIMIT \$1`).
WithArgs(CPInstanceReconcileLimit).
WillReturnRows(rows)
}
// TestReconcileOnce_NotRunning_FlipsOffline — the core bug (core#2247):
// an online SaaS workspace whose EC2 is terminated. CP reports a CLEAN
// (false, nil); onOffline MUST be called with that id so the existing
// auto-heal (status flip + RestartByID reprovision) kicks in.
func TestReconcileOnce_NotRunning_FlipsOffline(t *testing.T) {
mock := setupTestDB(t)
checker := &fakeRunningChecker{running: map[string]bool{"ws-dead": false}}
off := &recordingOffline{}
expectReconcileQuery(mock, sqlmock.NewRows([]string{"id"}).AddRow("ws-dead"))
reconcileOnce(context.Background(), checker, off.handler())
if got := off.got(); len(got) != 1 || got[0] != "ws-dead" {
t.Fatalf("expected onOffline(ws-dead), got %v", got)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestReconcileOnce_Running_DoesNotFlip — healthy steady state. CP
// reports (true, nil); the workspace stays online, onOffline is NOT
// called.
func TestReconcileOnce_Running_DoesNotFlip(t *testing.T) {
mock := setupTestDB(t)
checker := &fakeRunningChecker{running: map[string]bool{"ws-alive": true}}
off := &recordingOffline{}
expectReconcileQuery(mock, sqlmock.NewRows([]string{"id"}).AddRow("ws-alive"))
reconcileOnce(context.Background(), checker, off.handler())
if got := off.got(); len(got) != 0 {
t.Fatalf("running workspace must NOT be flipped offline, got %v", got)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestReconcileOnce_TransientError_DoesNotFlip — FAIL-SAFE contract.
// IsRunning returns (true, err) on a transient DB/transport blip; the
// reconciler MUST NOT flip the workspace offline. This is the guardrail
// that stops a CP outage from cascading every healthy workspace through
// reprovision.
func TestReconcileOnce_TransientError_DoesNotFlip(t *testing.T) {
mock := setupTestDB(t)
checker := &fakeRunningChecker{
errs: map[string]error{"ws-blip": errors.New("cp provisioner: status: connection reset")},
}
off := &recordingOffline{}
expectReconcileQuery(mock, sqlmock.NewRows([]string{"id"}).AddRow("ws-blip"))
reconcileOnce(context.Background(), checker, off.handler())
if got := off.got(); len(got) != 0 {
t.Fatalf("fail-safe violated: transient IsRunning error must NOT flip offline, got %v", got)
}
if calls := checker.calls; len(calls) != 1 || calls[0] != "ws-blip" {
t.Fatalf("expected IsRunning(ws-blip), got %v", checker.calls)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestReconcileOnce_QueryScopeExcludesExternalAndNonOnline — pins the
// SELECT predicate. The regex in expectReconcileQuery requires
// status='online' AND runtime <> 'external'; if a future edit widens the
// scope to include paused/hibernated/removed rows or external rows (owned
// by the heartbeat pass), this query no longer matches and sqlmock fails
// the test. With the predicate intact, a DB that has only out-of-scope
// rows returns empty → no IsRunning, no flip.
func TestReconcileOnce_QueryScopeExcludesExternalAndNonOnline(t *testing.T) {
mock := setupTestDB(t)
checker := &fakeRunningChecker{}
off := &recordingOffline{}
// The predicate filters out external + non-online rows server-side,
// modelled as the empty result those filters produce.
expectReconcileQuery(mock, sqlmock.NewRows([]string{"id"}))
reconcileOnce(context.Background(), checker, off.handler())
if len(checker.calls) != 0 {
t.Fatalf("out-of-scope rows must never reach IsRunning, got %v", checker.calls)
}
if got := off.got(); len(got) != 0 {
t.Fatalf("expected no offline flips for out-of-scope rows, got %v", got)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestReconcileOnce_MixedBatch — each row is judged independently: the
// dead one flips, the alive one and the transient-error one don't.
func TestReconcileOnce_MixedBatch(t *testing.T) {
mock := setupTestDB(t)
checker := &fakeRunningChecker{
running: map[string]bool{"ws-dead": false, "ws-alive": true},
errs: map[string]error{"ws-blip": errors.New("503")},
}
off := &recordingOffline{}
expectReconcileQuery(mock, sqlmock.NewRows([]string{"id"}).
AddRow("ws-dead").
AddRow("ws-alive").
AddRow("ws-blip"))
reconcileOnce(context.Background(), checker, off.handler())
if got := off.got(); len(got) != 1 || got[0] != "ws-dead" {
t.Fatalf("expected only ws-dead flipped, got %v", got)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestReconcileOnce_QueryError — DB transient failure. Reconcile returns
// without panicking and never probes IsRunning or flips anything.
func TestReconcileOnce_QueryError(t *testing.T) {
mock := setupTestDB(t)
checker := &fakeRunningChecker{}
off := &recordingOffline{}
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(CPInstanceReconcileLimit).
WillReturnError(errors.New("connection refused"))
reconcileOnce(context.Background(), checker, off.handler())
if len(checker.calls) != 0 || len(off.got()) != 0 {
t.Fatalf("query error must short-circuit; calls=%v offline=%v", checker.calls, off.got())
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestReconcileOnce_NilDB — defensive against db.DB being nil. Must not
// panic, must not probe, must not flip.
func TestReconcileOnce_NilDB(t *testing.T) {
saved := db.DB
db.DB = nil
t.Cleanup(func() { db.DB = saved })
checker := &fakeRunningChecker{}
off := &recordingOffline{}
reconcileOnce(context.Background(), checker, off.handler())
if len(checker.calls) != 0 || len(off.got()) != 0 {
t.Fatalf("nil db.DB must short-circuit; calls=%v offline=%v", checker.calls, off.got())
}
}
// TestStartCPInstanceReconciler_NilCheckerDisabled — boot-safety: a SaaS
// CP without cpProv configured must not start the loop (immediate return,
// no goroutine leak).
func TestStartCPInstanceReconciler_NilCheckerDisabled(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
done := make(chan struct{})
go func() {
StartCPInstanceReconciler(ctx, nil, nil, 60*time.Second)
close(done)
}()
select {
case <-done:
// expected — nil checker short-circuits.
case <-time.After(500 * time.Millisecond):
t.Fatal("StartCPInstanceReconciler(nil) did not return immediately")
}
}
// TestStartCPInstanceReconciler_RunsOnceImmediatelyAndExitsOnCancel —
// cadence contract: one sweep at boot (so a restart starts healing
// immediately), and the loop terminates on ctx cancel.
func TestStartCPInstanceReconciler_RunsOnceImmediatelyAndExitsOnCancel(t *testing.T) {
mock := setupTestDB(t)
checker := &fakeRunningChecker{}
off := &recordingOffline{}
// Boot sweep query. The 60s ticker won't fire inside the test window;
// register a second optional expectation so a stray tick can't fail.
expectReconcileQuery(mock, sqlmock.NewRows([]string{"id"}))
expectReconcileQuery(mock, sqlmock.NewRows([]string{"id"}))
ctx, cancel := context.WithCancel(context.Background())
done := make(chan struct{})
go func() {
StartCPInstanceReconciler(ctx, checker, off.handler(), 60*time.Second)
close(done)
}()
time.Sleep(100 * time.Millisecond)
cancel()
select {
case <-done:
// expected
case <-time.After(2 * time.Second):
t.Fatal("StartCPInstanceReconciler did not exit on ctx cancel")
}
}
@@ -0,0 +1,9 @@
-- Rollback for 20260604000000_activity_logs_seq.up.sql.
-- Drops the feed-ordering index and the monotonic seq column.
-- Run manually by an operator via psql; the boot-time runner never applies
-- *.down.sql (see RunMigrations in internal/db/postgres.go, issue #211).
DROP INDEX IF EXISTS idx_activity_ws_created_seq;
ALTER TABLE activity_logs
DROP COLUMN IF EXISTS seq;
@@ -0,0 +1,54 @@
-- Add a monotonic `seq` tiebreaker to activity_logs to make the poll-mode
-- since_id activity feed (#2339) deterministically ordered.
--
-- ROOT CAUSE this fixes: the feed orders by created_at ASC/DESC with NO
-- tiebreaker, and activity_logs.id is a random gen_random_uuid() — there is
-- no monotonic column to break ties. Two rows inserted in the same
-- microsecond (back-to-back A2A logging) share a created_at and come back in
-- arbitrary planner order, so the E2E intermittently sees
-- hello-from-e2e-3 before hello-from-e2e-2. Not a flake — a missing
-- tiebreaker. (Second, related bug fixed in the handler: the since_id cursor
-- filtered `created_at > cursor` strictly, silently dropping a row written in
-- the same microsecond as the cursor row. The composite key below lets the
-- handler compare the full (created_at, seq) tuple.)
--
-- `seq` is a GENERATED BY DEFAULT AS IDENTITY BIGINT — a UNIQUE,
-- monotonic-once-assigned tiebreaker. Precisely (verified on PostgreSQL
-- 16.13, the prod version):
-- * Backfill: adding the IDENTITY column to a populated table REWRITES the
-- table and assigns `seq` to every EXISTING row during the ALTER, in
-- PHYSICAL TABLE-SCAN order (NOT NULL — existing rows do get a value).
-- That order is not guaranteed to equal historical insertion order.
-- * The identity sequence then advances ABOVE max(seq), so every subsequent
-- INSERT that omits `seq` gets a fresh value strictly greater than the
-- backfilled max — collision-free with the backfilled rows.
-- * GENERATED BY DEFAULT (not ALWAYS) so existing INSERTs that don't name
-- `seq` keep working and a caller may still override it if ever needed.
--
-- What `seq` is NOT, and why that's fine:
-- * NOT guaranteed gap-free — rolled-back transactions burn sequence values.
-- * NOT a strict commit-order guarantee under concurrency — two concurrent
-- INSERTs may commit in the opposite order to the `seq` values they drew.
-- Neither property is needed. The feed only requires a TOTAL, STABLE
-- tiebreaker so that (created_at, seq) is a deterministic order: for any two
-- rows it always sorts them the same way and never ties. `seq` being unique
-- and non-null on every row delivers exactly that. Same-created_at rows were
-- returned in ARBITRARY order before this migration; afterward they have a
-- fixed, repeatable order — strictly better, never worse. New traffic is fully
-- deterministic; the backfill makes historical rows deterministic too.
--
-- Idempotent: ADD COLUMN IF NOT EXISTS + CREATE INDEX IF NOT EXISTS so the
-- boot-time runner (and the CI migrate-replay step) can re-apply this safely.
ALTER TABLE activity_logs
ADD COLUMN IF NOT EXISTS seq BIGINT GENERATED BY DEFAULT AS IDENTITY;
-- Composite index supporting the feed query: WHERE workspace_id = $1
-- AND created_at <cmp> $t ORDER BY created_at, seq. The (workspace_id,
-- created_at, seq) prefix serves both the ASC cursor path and the DESC recent
-- path (Postgres reads the same btree backwards for DESC). This is distinct
-- from migration 009's idx_activity_ws_type_time (workspace_id, activity_type,
-- created_at) — that one is type-prefixed and can't drive a type-agnostic feed
-- scan — and from 048's per-peer source_id/target_id indexes.
CREATE INDEX IF NOT EXISTS idx_activity_ws_created_seq
ON activity_logs (workspace_id, created_at, seq);