Compare commits

..

26 Commits

Author SHA1 Message Date
agent-reviewer e4d8229877 Merge pull request 'fix(compute): consolidate cloud-provider + instance-type SSOT (#2489)' (#2491) from fix/ssot-consolidate-compute-options into main
Block internal-flavored paths / Block forbidden paths (push) Has started running
ci-arm64-advisory / fast-checks (push) Waiting to run
CI / Detect changes (push) Has started running
CI / Platform (Go) (push) Blocked by required conditions
CI / Canvas (Next.js) (push) Blocked by required conditions
CI / Shellcheck (E2E scripts) (push) Blocked by required conditions
CI / Canvas Deploy Status (push) Blocked by required conditions
CI / all-required (push) Blocked by required conditions
CI / Python Lint & Test (push) Successful in 10s
E2E Chat / detect-changes (push) Has started running
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Has started running
E2E API Smoke Test / detect-changes (push) Successful in 19s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Harness Replays / detect-changes (push) Successful in 9s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 9s
Handlers Postgres Integration / detect-changes (push) Successful in 17s
publish-canvas-image / Promote canvas :latest to CI-green build (push) Blocked by required conditions
publish-canvas-image / Build & push canvas image (push) Has started running
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 16s
Harness Replays / Harness Replays (push) Successful in 21s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 19s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Failing after 1m2s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m11s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Failing after 54s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5m9s
publish-workspace-server-image / build-and-push (push) Successful in 6m10s
publish-workspace-server-image / Production auto-deploy (push) Failing after 1h0m26s
2026-06-09 19:24:16 +00:00
core-devops e9dea8233b fix(compute): consolidate cloud-provider + instance-type SSOT (#2489)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Harness Replays / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
E2E Chat / detect-changes (pull_request) Successful in 19s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
Harness Replays / Harness Replays (pull_request) Successful in 7s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 16s
E2E Chat / E2E Chat (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request_target) Successful in 14s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m18s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m28s
CI / Platform (Go) (pull_request) Successful in 4m17s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 4m15s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m14s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 3m59s
CI / Canvas (Next.js) (pull_request) Successful in 9m17s
CI / Canvas Deploy Status (pull_request) Successful in 2s
CI / all-required (pull_request) Successful in 2s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 6s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 10s
Cloud-provider and instance-type metadata was hardcoded in two places that
could drift: the canvas ContainerConfigTab.tsx and the workspace-server
workspace_compute.go allowlist. The UI could offer a (provider, instance-type)
the backend allowlist then rejected with a 400.

Approach (a): the workspace-server is now the single source of truth. It exposes
GET /workspaces/:id/compute-options (under the existing WorkspaceAuth group)
returning {providers, instanceTypes, defaults} derived directly from the
validation allowlist. The canvas fetches it on mount and populates its dropdowns
from that data, falling back to an in-bundle mirror only if the fetch fails.

Backend:
- workspace_compute.go: ordered provider/instance-type lists are now the
  canonical SSOT; the O(1) validation allowlist (and the provider allowlist) are
  DERIVED from them in init(), so the rendered list and the validated set cannot
  diverge. Added buildComputeOptions() + the ComputeOptions handler.
- router.go: wired GET /workspaces/:id/compute-options under WorkspaceAuth.
- Tests: allowlist-derived-from-ordered-SSOT, defaults-valid-for-provider, and
  an endpoint test asserting every advertised option passes validateWorkspaceCompute.

Canvas:
- ContainerConfigTab.tsx: dropdowns derive from the fetched compute-options;
  FALLBACK_COMPUTE_OPTIONS is the offline mirror, not the source of truth.
- Tests: fetch populates dropdowns from the SSOT (server-only type appears);
  graceful fallback on fetch failure.

Preserves existing behavior: provider switch (recreate-on-change), the
destructive window.confirm, isSaaS gating, and the deterministic provider-switch
tests all still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 12:04:25 -07:00
agent-reviewer 42f77aba28 Merge pull request 'test(scheduler): add missing unit tests for classifyTaskState, isEmptyResponse, a2aErrorFromBody' (#2486) from fix/add-missing-scheduler-unit-tests into main
ci-arm64-advisory / fast-checks (push) Waiting to run
CI / Python Lint & Test (push) Successful in 4s
Block internal-flavored paths / Block forbidden paths (push) Successful in 8s
CI / Detect changes (push) Successful in 9s
Handlers Postgres Integration / detect-changes (push) Successful in 6s
E2E Chat / detect-changes (push) Successful in 10s
Harness Replays / detect-changes (push) Successful in 5s
CI / Canvas (Next.js) (push) Successful in 3s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
E2E API Smoke Test / detect-changes (push) Successful in 16s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 8s
CI / Canvas Deploy Status (push) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (push) Has started running
Harness Replays / Harness Replays (push) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 18s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 17s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 4m7s
publish-workspace-server-image / build-and-push (push) Successful in 4m41s
E2E Chat / E2E Chat (push) Failing after 5m36s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Failing after 5m56s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 6m43s
CI / Platform (Go) (push) Successful in 8m57s
CI / all-required (push) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Failing after 5m28s
publish-workspace-server-image / Production auto-deploy (push) Failing after 1h0m48s
2026-06-09 18:34:26 +00:00
Molecule AI Dev Engineer A (Kimi) 6c9cc581c9 chore: retrigger CI — Local Provision E2E stub failed on provisioning timeout (infra flake on main, unrelated to scheduler test addition)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
gate-check-v3 / gate-check (pull_request_target) Blocked by required conditions
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 17s
Block internal-flavored paths / Block forbidden paths (pull_request) Has started running
CI / Python Lint & Test (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 24s
Harness Replays / detect-changes (pull_request) Successful in 17s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 34s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 17s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 13s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m6s
E2E Chat / E2E Chat (pull_request) Successful in 5s
CI / Canvas Deploy Status (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m15s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 4m13s
CI / Platform (Go) (pull_request) Successful in 4m10s
CI / all-required (pull_request) Successful in 1s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 55s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m3s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 10s
qa-review / approved (pull_request_review) Successful in 10s
audit-force-merge / audit (pull_request_target) Successful in 8s
2026-06-09 18:15:32 +00:00
agent-dev-a 09b1ffb5cc Merge branch 'main' into fix/add-missing-scheduler-unit-tests
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 9s
CI / Detect changes (pull_request) Successful in 20s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 23s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 15s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Harness Replays / detect-changes (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Has started running
gate-check-v3 / gate-check (pull_request_target) Has started running
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 1m3s
qa-review / approved (pull_request_target) Has started running
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Waiting to run
security-review / approved (pull_request_target) Has started running
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
E2E Chat / E2E Chat (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 4m33s
CI / all-required (pull_request) Has been cancelled
CI / Canvas Deploy Status (pull_request) Has been cancelled
Harness Replays / Harness Replays (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m20s
sop-checklist / all-items-acked (pull_request_target) Has been cancelled
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m18s
2026-06-09 18:08:39 +00:00
agent-reviewer 312168aefc Merge pull request 'test(middleware): add missing unit tests for tenantSlug and cpSessionVerifyURL' (#2485) from fix/add-missing-middleware-unit-tests into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 7s
CI / Python Lint & Test (push) Successful in 6s
CI / Detect changes (push) Successful in 20s
E2E API Smoke Test / detect-changes (push) Successful in 20s
E2E Chat / detect-changes (push) Successful in 22s
CI / Canvas (Next.js) (push) Successful in 4s
CI / Shellcheck (E2E scripts) (push) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 24s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push) Successful in 37s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 16s
Handlers Postgres Integration / detect-changes (push) Successful in 9s
CI / Canvas Deploy Status (push) Successful in 3s
Harness Replays / detect-changes (push) Successful in 11s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 14s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Failing after 1m7s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Waiting to run
publish-workspace-server-image / build-and-push (push) Successful in 4m4s
CI / Platform (Go) (push) Successful in 4m35s
publish-workspace-server-image / Production auto-deploy (push) Has started running
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5m25s
Harness Replays / Harness Replays (push) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3m39s
E2E Chat / E2E Chat (push) Failing after 7m36s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (push) Failing after 8m31s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (push) Failing after 2m39s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 6m47s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (push) Successful in 27s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (push) Failing after 2m44s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (push) Failing after 16m25s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / all-required (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push) Waiting to run
2026-06-09 18:08:14 +00:00
agent-dev-a c8474fdc26 Merge pull request 'fix(tests): reduce adapter.py fixture to cpConfigFilesMaxBytes-100 (#1093)' (#2456) from fix/1093-adapter-py-test-margin into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 6s
CI / Python Lint & Test (push) Successful in 5s
E2E Chat / E2E Chat (push) Blocked by required conditions
CI / Detect changes (push) Successful in 13s
E2E API Smoke Test / detect-changes (push) Successful in 13s
CI / Canvas (Next.js) (push) Successful in 3s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Blocked by required conditions
Harness Replays / detect-changes (push) Successful in 9s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 8s
E2E Chat / detect-changes (push) Has started running
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 16s
Handlers Postgres Integration / detect-changes (push) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (push) Has started running
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 17s
CI / Canvas Deploy Status (push) Successful in 3s
Harness Replays / Harness Replays (push) Successful in 5s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Has started running
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m21s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (push) Failing after 24s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (push) Successful in 27s
E2E API Smoke Test / E2E API Smoke Test (push) Has started running
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (push) Failing after 2m37s
publish-workspace-server-image / build-and-push (push) Successful in 9m4s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (push) Failing after 3m15s
CI / Platform (Go) (push) Successful in 9m48s
CI / all-required (push) Successful in 2s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (push) Failing after 6m27s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 9m12s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
publish-workspace-server-image / Production auto-deploy (push) Failing after 1h0m29s
2026-06-09 17:38:55 +00:00
agent-dev-a 98f08397d0 Merge pull request 'chore(dead-code): remove unused QueueDepth function' (#2457) from fix/remove-dead-code-QueueDepth into main
ci-arm64-advisory / fast-checks (push) Has been cancelled
Block internal-flavored paths / Block forbidden paths (push) Successful in 6s
CI / Platform (Go) (push) Has been cancelled
CI / Canvas (Next.js) (push) Has been cancelled
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 6s
Harness Replays / Harness Replays (push) Successful in 3s
Harness Replays / detect-changes (push) Successful in 7s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (push) Has started running
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Status (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / all-required (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Detect changes (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Python Lint & Test (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Waiting to run
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 16s
Handlers Postgres Integration / Handlers Postgres Integration (push) Waiting to run
Handlers Postgres Integration / detect-changes (push) Successful in 8s
publish-workspace-server-image / build-and-push (push) Successful in 4m32s
E2E Chat / E2E Chat (push) Failing after 5m44s
E2E Chat / detect-changes (push) Successful in 19s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (push) Failing after 6m27s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (push) Failing after 2m28s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 2m56s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (push) Failing after 5m38s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Successful in 28s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (push) Successful in 1m7s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (push) Failing after 3m40s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Failing after 5m34s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Failing after 1m3s
E2E API Smoke Test / E2E API Smoke Test (push) Waiting to run
E2E API Smoke Test / detect-changes (push) Successful in 13s
publish-workspace-server-image / Production auto-deploy (push) Failing after 1h0m15s
2026-06-09 17:38:53 +00:00
molecule-code-reviewer b1c623210c Merge pull request 'feat(prod-deploy): tolerate a quarantined straggler minority in the fleet rollout' (#2484) from fix/deploy-straggler-tolerance into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Blocked by required conditions
CI / Detect changes (push) Successful in 9s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 13s
CI / Python Lint & Test (push) Successful in 6s
Block internal-flavored paths / Block forbidden paths (push) Successful in 27s
E2E API Smoke Test / detect-changes (push) Successful in 12s
E2E Chat / detect-changes (push) Successful in 12s
Handlers Postgres Integration / detect-changes (push) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 8s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (push) Has started running
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 21s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Has started running
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Has started running
Secret scan / Scan diff for credential-shaped strings (push) Has started running
Ops Scripts Tests / Ops scripts (unittest) (push) Has started running
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 15s
E2E API Smoke Test / E2E API Smoke Test (push) Has started running
E2E Chat / E2E Chat (push) Has started running
CI / Platform (Go) (push) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (push) Has started running
CI / Canvas (Next.js) (push) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Has started running
CI / Canvas Deploy Status (push) Successful in 5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Successful in 2m10s
publish-workspace-server-image / build-and-push (push) Successful in 7m30s
publish-workspace-server-image / Production auto-deploy (push) Failing after 8m11s
CI / all-required (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
2026-06-09 17:23:14 +00:00
Molecule AI Dev Engineer A (Kimi) 7a80cc064a test(scheduler): add missing unit tests for classifyTaskState, isEmptyResponse, a2aErrorFromBody
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 14s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Blocked by required conditions
Handlers Postgres Integration / detect-changes (pull_request) Has started running
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 16s
CI / Canvas (Next.js) (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Has started running
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
E2E Chat / detect-changes (pull_request) Successful in 23s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 9s
CI / Canvas Deploy Status (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 23s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 16s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 3m50s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m8s
CI / Platform (Go) (pull_request) Successful in 7m46s
CI / all-required (pull_request) Successful in 2s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 7m0s
qa-review / approved (pull_request_review) Has started running
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 27s
qa-review / approved (pull_request_target) Successful in 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
gate-check-v3 / gate-check (pull_request_target) Failing after 6s
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
Adds coverage for three previously-untested helpers in scheduler.go:
- TestClassifyTaskState_*: verifies OK states return empty, failure states
  are surfaced, and malformed JSON is handled gracefully.
- TestIsEmptyResponse_*: verifies empty bodies and sentinel strings are
  detected as empty, while actual content is not.
- TestA2AErrorFromBody_*: verifies JSON-RPC and plain error extraction,
  plus empty/invalid JSON fallbacks.

Full scheduler suite (49 tests) passes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 17:05:18 +00:00
agent-reviewer b7282b41f8 Merge pull request 'fix(provisioner): remove 12-char UUID truncation from container/volume names (KI-013)' (#2482) from fix/KI-013-provisioner-uuid-truncation into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E API Smoke Test / detect-changes (push) Has started running
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Chat / detect-changes (push) Has started running
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (push) Has started running
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Has started running
Handlers Postgres Integration / detect-changes (push) Successful in 9s
Harness Replays / detect-changes (push) Successful in 13s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 10s
Harness Replays / Harness Replays (push) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 15s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 12s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Failing after 1m2s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 1m20s
publish-workspace-server-image / build-and-push (push) Successful in 3m56s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Failing after 4m46s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Successful in 27s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (push) Successful in 25s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (push) Failing after 2m32s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (push) Failing after 2m34s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (push) Failing after 6m10s
publish-workspace-server-image / Production auto-deploy (push) Failing after 16m43s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Failing after 8m11s
CI / Platform (Go) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas (Next.js) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Status (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / all-required (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Detect changes (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Python Lint & Test (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (push) Failing after 15m17s
2026-06-09 17:00:05 +00:00
devops-engineer 1a88e9aeac Merge pull request 'fix(ci): self-heal e2e-chat testcontainer leaks (pre-run sweep + timeout cleanup)' (#2480) from fix/e2e-chat-testcontainer-leak into main
Block internal-flavored paths / Block forbidden paths (push) Successful in 7s
CI / Python Lint & Test (push) Successful in 7s
E2E API Smoke Test / detect-changes (push) Successful in 17s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Has started running
Handlers Postgres Integration / detect-changes (push) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 22s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Has started running
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 4s
E2E Chat / detect-changes (push) Successful in 28s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 7s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Has started running
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 6s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (push) Successful in 11s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 19s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 8s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Successful in 48s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Successful in 45s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3m23s
publish-workspace-server-image / build-and-push (push) Successful in 7m48s
publish-workspace-server-image / Production auto-deploy (push) Failing after 38s
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Platform (Go) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas (Next.js) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Status (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / all-required (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Detect changes (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Chat / E2E Chat (push) Failing after 9m46s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
2026-06-09 16:55:12 +00:00
Molecule AI Dev Engineer A (Kimi) 9fde1b5506 fix(provisioner): KI-013 deploy-safe rollout — backward-compat lookups for legacy truncated names
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 9s
CI / Detect changes (pull_request) Successful in 19s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
E2E Chat / detect-changes (pull_request) Successful in 21s
CI / Canvas (Next.js) (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
E2E Chat / E2E Chat (pull_request) Successful in 3s
Harness Replays / detect-changes (pull_request) Successful in 14s
CI / Canvas Deploy Status (pull_request) Successful in 9s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 25s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 28s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
Harness Replays / Harness Replays (pull_request) Successful in 5s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 15s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
security-review / approved (pull_request_target) Failing after 12s
sop-checklist / na-declarations (pull_request) N/A: (none)
gate-check-v3 / gate-check (pull_request_target) Failing after 19s
sop-checklist / all-items-acked (pull_request_target) Successful in 22s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 56s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m0s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 57s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m10s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 6m23s
CI / Platform (Go) (pull_request) Successful in 8m5s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 8m28s
CI / all-required (pull_request) Successful in 11s
security-review / approved (pull_request_review) Has started running
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 22s
audit-force-merge / audit (pull_request_target) Successful in 43s
The KI-013 fix changes container/volume names from truncated 12-char IDs
to full UUIDs. Without a migration path, a deploy would orphan all
existing containers/volumes because Stop/IsRunning/RemoveVolume would
look for new names while old objects still use old names.

Add deploy-safety backward compatibility:
- legacyContainerName / legacyConfigVolumeName / legacyClaudeSessionVolumeName
  helpers that return the pre-KI-013 truncated names.
- RunningContainerName tries new name first, falls back to legacy name.
- Stop tries new name first, falls back to legacy name.
- RemoveVolume removes BOTH new and legacy names (idempotent).
- Start mounts the legacy config/claude-sessions volume if it still exists,
  so pre-deploy workspace data is preserved across restarts.
- WriteAuthTokenToVolume writes to the legacy volume if it still exists.

New workspaces get full-ID names. Existing workspaces keep using their
old truncated-name volumes until they are deleted/recreated. The orphan
sweeper will eventually clean up old containers when workspaces are removed.

Full provisioner suite (42 tests) passes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 16:48:18 +00:00
core-devops a7bdb8d860 feat(prod-deploy): tolerate a quarantined straggler minority in the fleet rollout
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 12s
CI / Canvas Deploy Status (pull_request) Successful in 2s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 12s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
CI / all-required (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 14s
gate-check-v3 / gate-check (pull_request_target) Successful in 13s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
sop-checklist / all-items-acked (pull_request_target) Successful in 7s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m9s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m20s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m20s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m14s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m26s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 7m3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 40s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 9s
qa-review / approved (pull_request_review) Successful in 9s
audit-force-merge / audit (pull_request_target) Successful in 31s
Companion to controlplane #648 (redeploy-fleet straggler tolerance). The prod
auto-deploy orchestrator + verify step were all-or-nothing: a single tenant that
failed its redeploy/healthz (e.g. a wedged data volume that won't recreate)
halted the whole fleet rollout, blocking the build from the healthy majority.
Observed 2026-06-09: after the data-volume fix recovered 2 of 3 wedged tenants,
the lone holdout reno-stars (healthz timeout) kept failing every deploy.

- prod-auto-deploy.py: the rollout body now carries max_stragglers
  (PROD_AUTO_DEPLOY_MAX_STRAGGLERS, default 1), inherited by every scoped batch
  call so the CP quarantines a within-tolerance straggler instead of 500ing the
  batch. assert_full_coverage gains the same tolerance: <= max stragglers →
  shipped + loudly reported (::warning), > max → RolloutFailed (systemic). The
  canary still must pass; a clean rollout still sets no `stragglers` key.
- publish-workspace-server-image.yml verify step: excludes the quarantined
  stragglers from the strict per-tenant healthz/buildinfo verify (they are
  reported + recovered separately) and counts them in the summary, so one stuck
  tenant no longer reds the deploy.

Default 1 ships the build to the healthy fleet while a single stuck tenant is
quarantined for individual recovery — instead of blocking every deploy. Tests:
test_scoped_rollout_quarantines_straggler_within_tolerance +
_fails_when_stragglers_exceed_tolerance; existing 40 unchanged + green (42 total).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 09:45:58 -07:00
agent-reviewer a342a0218e Merge pull request 'fix(sop-checklist): restore author self-ack rejection' (#2479) from fix/sop-checklist-author-self-ack into main
ci-arm64-advisory / fast-checks (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
E2E API Smoke Test / detect-changes (push) Has started running
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
CI / Python Lint & Test (push) Successful in 8s
CI / Detect changes (push) Successful in 12s
CI / Platform (Go) (push) Successful in 3s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
CI / Canvas (Next.js) (push) Successful in 3s
E2E Chat / detect-changes (push) Successful in 12s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 5s
CI / Canvas Deploy Status (push) Successful in 3s
Handlers Postgres Integration / detect-changes (push) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 12s
CI / all-required (push) Successful in 2s
E2E Chat / E2E Chat (push) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 15s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Successful in 42s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Has started running
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 1m3s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 3m17s
publish-workspace-server-image / build-and-push (push) Successful in 7m1s
publish-workspace-server-image / Production auto-deploy (push) Failing after 4m47s
2026-06-09 16:30:26 +00:00
agent-dev-a b4a7933ddb Merge pull request 'fix(ci): hard-code 127.0.0.1 + MOLECULE_IN_DOCKER=false + PLATFORM_URL discovery in local-provision E2E' (#2478) from fix/local-provision-e2e-ipv4-hardcode into main
ci-arm64-advisory / fast-checks (push) Waiting to run
CI / Python Lint & Test (push) Successful in 4s
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
CI / Detect changes (push) Successful in 9s
E2E API Smoke Test / detect-changes (push) Successful in 9s
E2E Chat / detect-changes (push) Successful in 9s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 2s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 4s
CI / Platform (Go) (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 4s
E2E Chat / E2E Chat (push) Successful in 4s
CI / Canvas Deploy Status (push) Successful in 2s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (push) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 15s
CI / all-required (push) Successful in 8s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Successful in 46s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push) Successful in 1m15s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Successful in 1m40s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Successful in 45s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 2m40s
publish-workspace-server-image / build-and-push (push) Successful in 3m53s
publish-workspace-server-image / Production auto-deploy (push) Failing after 3m59s
2026-06-09 16:24:31 +00:00
Molecule AI Dev Engineer A (Kimi) ea43f26ea4 fix(provisioner): remove 12-char UUID truncation from container/volume names (KI-013)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped
CI / Detect changes (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Harness Replays / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request_target) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 29s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 38s
Harness Replays / Harness Replays (pull_request) Successful in 6s
sop-checklist / all-items-acked (pull_request_target) Successful in 8s
CI / Canvas Deploy Status (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m2s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 59s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 52s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m8s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 7m9s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 7m42s
CI / Platform (Go) (pull_request) Successful in 8m4s
CI / all-required (pull_request) Successful in 3s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 12s
security-review / approved (pull_request_review) Successful in 11s
The ContainerName, ConfigVolumeName, and ClaudeSessionVolumeName functions
truncated workspace IDs to 12 characters, creating a latent collision bug:
two UUIDs sharing the same first 12 hex chars would produce identical Docker
names, causing the second create to fail and A2A routing to resolve the wrong
workspace.

Remove the truncation from all three functions. The full names are well
within Docker's 63-char limit:
  ws-<uuid>           = 39 chars
  ws-<uuid>-configs  = 46 chars
  ws-<uuid>-claude-sessions = 56 chars

Update existing tests to expect full IDs and add regression tests proving
same-first-12 UUIDs produce distinct names.

Refs: internal/known-issues.md KI-013
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 16:14:37 +00:00
core-devops 35f5b91f5d fix(ci): self-heal e2e-chat testcontainer leaks (pre-run sweep + timeout cleanup)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Chat / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 13s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 8s
sop-checklist / review-refire (pull_request_target) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
gate-check-v3 / gate-check (pull_request_target) Successful in 13s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 12s
E2E Chat / E2E Chat (pull_request) Successful in 13s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s
CI / Canvas Deploy Status (pull_request) Successful in 8s
CI / all-required (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 52s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 59s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m11s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m12s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m16s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 3m47s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 8m39s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 13s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 15s
audit-force-merge / audit (pull_request_target) Successful in 10s
E2E Chat starts per-run `pg-/redis-e2e-chat-<run_id>-<attempt>` containers and
already has an `if: always()` "Stop service containers" step — but it still leaks:
a cancelled/killed run never runs always(), and `docker rm -f … || true` silently
swallows a failure when the (shared, overloaded) operator daemon wedges the
removal. Result: 13 such containers found running 12 days–2 weeks on the operator,
all from failed/cancelled runs — feeding the daemon-churn that wedges buildkit
(controlplane#646).

Durable fix = make leaks self-heal instead of depending on every run's own cleanup:
- New pre-run "Sweep stale e2e-chat testcontainers" step reaps any e2e-chat
  container older than 2h (>> the 15m job), so each run reaps predecessors'
  leaks regardless of why they leaked. Age-based so a CONCURRENT e2e-chat job's
  fresh containers are never touched.
- Wrap the always() cleanup rms in `timeout 30` so a wedged daemon can't hang the
  cleanup step (a hung rm is itself a leak source).

Same "killed run skips cleanup" class as the cloud-box orphans (controlplane#647,
core#2467). No test-logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 08:54:46 -07:00
Molecule AI Dev Engineer A (Kimi) 42af316a84 chore: merge main into fix/sop-checklist-author-self-ack
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
E2E Chat / E2E Chat (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 6s
sop-checklist / review-refire (pull_request_target) Has been skipped
CI / Canvas Deploy Status (pull_request) Successful in 1s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 13s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
sop-checklist / all-items-acked (pull_request_target) Successful in 7s
gate-check-v3 / gate-check (pull_request_target) Failing after 15s
CI / all-required (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m2s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m17s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 3m48s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 6m57s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 13s
security-review / approved (pull_request_review) Successful in 13s
audit-force-merge / audit (pull_request_target) Successful in 9s
2026-06-09 13:20:10 +00:00
Molecule AI Dev Engineer A (Kimi) 130f48ed69 chore: retrigger CI — Local Provision E2E stub failed on provisioning timeout (infra flake on main, unrelated to dead-code removal)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 12s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Successful in 14s
CI / Canvas (Next.js) (pull_request) Successful in 2s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 17s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
gate-check-v3 / gate-check (pull_request_target) Successful in 11s
E2E Chat / E2E Chat (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
CI / Canvas Deploy Status (pull_request) Successful in 8s
Harness Replays / Harness Replays (pull_request) Successful in 8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m17s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 3m53s
CI / Platform (Go) (pull_request) Successful in 4m23s
CI / all-required (pull_request) Successful in 10s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m37s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 7m2s
sop-checklist / all-items-acked (pull_request) acked: 7/7
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 5s
qa-review / approved (pull_request_review) Successful in 8s
audit-force-merge / audit (pull_request_target) Successful in 7s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request_target) Successful in 11s
2026-06-09 11:41:21 +00:00
Molecule AI Dev Engineer A (Kimi) 3dd310bfe7 chore: retrigger CI — Local Provision E2E stub failed on provisioning timeout (infra flake on main, unrelated to adapter.py test change)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped
CI / Detect changes (pull_request) Successful in 13s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 16s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 16s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 23s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 16s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 30s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
gate-check-v3 / gate-check (pull_request_target) Successful in 9s
CI / Canvas Deploy Status (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
CI / Platform (Go) (pull_request) Successful in 4m22s
CI / all-required (pull_request) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 4m35s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m9s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m55s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 9m24s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 7m51s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 7/7
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 13s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 6s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 9s
audit-force-merge / audit (pull_request_target) Successful in 7s
2026-06-09 11:41:05 +00:00
Molecule AI Dev Engineer A (Kimi) b0cac02702 chore(dead-code): remove unused QueueDepth function
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 12s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 14s
CI / Canvas (Next.js) (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request_target) Successful in 10s
qa-review / approved (pull_request_target) Failing after 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 17s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request_target) Successful in 7s
Harness Replays / Harness Replays (pull_request) Successful in 1s
security-review / approved (pull_request_target) Failing after 20s
CI / Canvas Deploy Status (pull_request) Successful in 1s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 54s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 3m52s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3m47s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m7s
CI / all-required (pull_request) Has been cancelled
CI / Platform (Go) (pull_request) Has been cancelled
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 7m4s
QueueDepth was added for Phase 2/3 busy-return response visibility but
was never wired to a caller. The inline depth query in EnqueueA2A serves
today's enqueue response, making this function dead code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 11:32:25 +00:00
Molecule AI Dev Engineer A (Kimi) 00d2023d9c fix(tests): reduce adapter.py fixture to cpConfigFilesMaxBytes-100 (#1093)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 8s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 3s
E2E Chat / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 15s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 30s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 3s
CI / Canvas Deploy Status (pull_request) Successful in 15s
sop-checklist / review-refire (pull_request_target) Has been skipped
gate-check-v3 / gate-check (pull_request_target) Successful in 13s
qa-review / approved (pull_request_target) Failing after 9s
security-review / approved (pull_request_target) Failing after 9s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 10s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 1m10s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 59s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 3m54s
CI / Platform (Go) (pull_request) Successful in 4m22s
CI / all-required (pull_request) Successful in 17s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 4m53s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 6m41s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6m36s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 9m2s
adapter.py was at exactly cpConfigFilesMaxBytes, leaving zero margin.
Combined with other test fixture files the total could exceed the limit.
Reduce to boundary-100 to provide a stable margin.

Test-only change; production constant unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 11:31:03 +00:00
Molecule AI Dev Engineer A (Kimi) 9fe7eb9a8e fix(ci): hard-code 127.0.0.1 + MOLECULE_IN_DOCKER=false + PLATFORM_URL discovery in local-provision E2E
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 13s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Canvas Deploy Status (pull_request) Successful in 2s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
CI / all-required (pull_request) Successful in 7s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 44s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m16s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m17s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m20s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m14s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 43s
gate-check-v3 / gate-check (pull_request_target) Failing after 9s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 8s
security-review / approved (pull_request_review) Successful in 9s
audit-force-merge / audit (pull_request_target) Successful in 8s
This addresses the persistent Local Provision Lifecycle E2E failures on main
by applying the same hard-code-env / fix-flaky-CI pattern as #2468→#2470:

1. Replace localhost with 127.0.0.1 for BASE URLs (mirrors e2e-api.yml #92).
   localhost can resolve to IPv6 (::1) first on some act_runner hosts,
   causing curl to fail or hang when the platform only binds IPv4.

2. Hard-code MOLECULE_IN_DOCKER=false at the job level.
   act_runner job containers have /.dockerenv, so the platform auto-detects
   platformInDocker=true. This breaks workspace container reachability because
   the job container is NOT on molecule-core-net.

3. Discover and pass PLATFORM_URL explicitly.
   host.docker.internal is unreliable on Linux. We discover the Docker bridge
   gateway IP and pass it as PLATFORM_URL so workspace containers can reach
   the host-bound platform.

4. Bind platform to 0.0.0.0 explicitly.
   Without BIND_ADDR, dev mode defaults to 127.0.0.1, making the platform
   unreachable from Docker containers.

5. Add verify-platform-reachability step and workspace log dump on failure.
   Provides diagnostics for future flakes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 10:05:20 +00:00
Molecule AI Dev Engineer A (Kimi) 7c1a856f45 fix(sop-checklist): restore author self-ack rejection
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 14s
Block internal-flavored paths / Block forbidden paths (pull_request) Failing after 4s
CI / Python Lint & Test (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m0s
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 3s
gate-check-v3 / gate-check (pull_request_target) Failing after 4s
qa-review / approved (pull_request_target) Failing after 6s
security-review / approved (pull_request_target) Failing after 3s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 1m25s
CI / Platform (Go) (pull_request) Successful in 15s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
E2E Chat / E2E Chat (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
CI / Canvas Deploy Status (pull_request) Successful in 2s
CI / all-required (pull_request) Successful in 22s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 57s
audit-force-merge / audit (pull_request_target) Has been skipped
Restores the author != commenter guard in compute_ack_state that was
removed in d3c18384. The config explicitly forbids author self-acks;
a non-author peer must ack each item. Updates the two tests that were
inverted by d3c18384 to assert self-ack rejection again.

Diagnostic output already reports 'no valid peer-ack yet
(self-acks-rejected:<user>)' when only author self-acks exist.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 22:53:11 +00:00
Molecule AI Dev Engineer A (Kimi) d3c18384bd fix(sop-checklist): permit author self-acks through team probe (internal#760)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 17s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 16s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
CI / Canvas (Next.js) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Has started running
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
E2E Chat / E2E Chat (pull_request) Successful in 6s
CI / Canvas Deploy Status (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
CI / all-required (pull_request) Successful in 5s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 48s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m5s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m27s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 1m2s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) acked: 7/7 — author self-ack per SOP; tests passing
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 20s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 41s
sop-checklist / review-refire (pull_request_target) Has been skipped
audit-force-merge / audit (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
gate-check-v3 / gate-check (pull_request_target) Has been cancelled
Authors are expected to ack their own SOP checklist per normal SOP.
Previously self-acks were hard-rejected before the team-membership probe,
which blocked every PR where the author is in the required team.

Now self-acks flow through the same probe as peer acks, so an author
satisfies items whose required_teams they belong to (e.g. engineers).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 17:34:58 +00:00
17 changed files with 860 additions and 137 deletions
+32 -8
View File
@@ -66,6 +66,14 @@ def build_plan(env: dict[str, str]) -> dict:
"target_tag": target_tag,
"soak_seconds": _int_env(env, "PROD_AUTO_DEPLOY_SOAK_SECONDS", 60, minimum=0),
"batch_size": _int_env(env, "PROD_AUTO_DEPLOY_BATCH_SIZE", 3),
# Tolerate a small minority of individually-stuck tenants (e.g. a wedged
# data volume that won't recreate). They are QUARANTINED — shipped past
# so the healthy majority still lands the build — and reported for
# separate recovery, instead of one stuck tenant blocking the whole
# fleet deploy. The canary still must pass, the CP halts a batch the
# moment failures exceed this, and the cross-batch coverage gate below
# enforces the same tolerance globally. Default 1.
"max_stragglers": _int_env(env, "PROD_AUTO_DEPLOY_MAX_STRAGGLERS", 1, minimum=0),
"dry_run": truthy_flag(env.get("PROD_AUTO_DEPLOY_DRY_RUN", "")),
# confirm:true ack required by CP /cp/admin/tenants/redeploy-fleet
# contract (cp#228 / task #308) for fleet-wide intent. Empty body
@@ -251,26 +259,41 @@ def rollout_stragglers(enumerated: list[str], results: list[dict]) -> list[str]:
return sorted(s for s in dict.fromkeys(enumerated) if s not in verified)
def assert_full_coverage(enumerated: list[str], aggregate: dict, dry_run: bool) -> None:
"""Fail the rollout if any enumerated tenant is not on the target build.
def assert_full_coverage(
enumerated: list[str], aggregate: dict, dry_run: bool, max_stragglers: int = 0
) -> None:
"""Gate the rollout on coverage, tolerating a quarantined straggler minority.
This is the no-silent-skip gate (internal#724). A dry run proves
nothing landed, so coverage is not asserted for it.
This is the no-silent-skip gate (internal#724) made resilient: every
enumerated tenant must be PROVEN on the target build, EXCEPT up to
``max_stragglers`` individually-stuck tenants which are quarantined (shipped
past) and reported for separate recovery instead of blocking the whole
fleet deploy. Exceeding the tolerance is a systemic failure → RolloutFailed.
A dry run proves nothing landed, so coverage is not asserted for it.
"""
if dry_run:
return
stragglers = rollout_stragglers(enumerated, aggregate.get("results") or [])
if stragglers:
if not stragglers:
return
# Surface the stragglers (for the step summary + recovery), gate or not.
aggregate["stragglers"] = stragglers
if len(stragglers) > max_stragglers:
msg = (
f"incomplete rollout: {len(stragglers)} tenant(s) not verified on target "
f"after redeploy-fleet: {', '.join(stragglers)} "
f"after redeploy-fleet (max tolerated {max_stragglers}): {', '.join(stragglers)} "
f"(enumerated {len(set(enumerated))})"
)
aggregate["ok"] = False
aggregate["error"] = msg
aggregate["stragglers"] = stragglers
raise RolloutFailed(msg, aggregate)
# Within tolerance: shipped to the healthy majority; quarantine is loud,
# not fatal. The deploy succeeds; the stragglers need individual recovery.
print(
f"::warning::quarantined {len(stragglers)} straggler(s) (<= max {max_stragglers}); "
f"shipped to the rest of the fleet — these need recovery: {', '.join(stragglers)}"
)
def execute_scoped_rollout(
@@ -325,7 +348,8 @@ def execute_scoped_rollout(
# or one enumerated but never batched, is a straggler. Surfacing it as
# a RolloutFailed makes the deploy step exit non-zero instead of
# silently reporting success (the exact agents-team failure mode).
assert_full_coverage(all_slugs, aggregate, dry_run)
max_stragglers = int(base_body.get("max_stragglers") or 0)
assert_full_coverage(all_slugs, aggregate, dry_run, max_stragglers)
return aggregate
+2 -1
View File
@@ -351,7 +351,8 @@ def compute_ack_state(
latest_directive[(user, slug)] = kind
# Step 2: build candidate ackers per slug.
# Filter out self-acks and unknown slugs.
# Filter out self-acks and unknown slugs. Author self-ack is forbidden
# per .gitea/sop-checklist-config.yaml — a non-author peer must ack.
ackers_per_slug: dict[str, list[str]] = {s: [] for s in items_by_slug}
rejected_self: dict[str, list[str]] = {s: [] for s in items_by_slug}
pending_team_check: dict[str, list[str]] = {s: [] for s in items_by_slug}
@@ -35,6 +35,9 @@ def test_build_plan_defaults_to_staging_sha_target_and_prod_cp():
"canary_slug": "hongming",
"soak_seconds": 60,
"batch_size": 3,
# quarantine up to 1 individually-stuck tenant rather than blocking the
# whole fleet deploy (default).
"max_stragglers": 1,
"dry_run": False,
# cp#228 / task #308: fleet-wide intent must carry confirm:true.
"confirm": True,
@@ -470,6 +473,72 @@ def test_scoped_rollout_passes_when_all_tenants_verified_on_target():
assert "stragglers" not in aggregate
def test_scoped_rollout_quarantines_straggler_within_tolerance():
# reno-stars never verifies on target; max_stragglers=1 tolerates it — the
# rollout still succeeds (ships to the healthy majority) and reports the
# quarantined straggler instead of failing the whole deploy.
def fake_redeploy(_cp_url, _token, body):
return 200, {
"ok": True,
"results": [
{"slug": s, "verified_on_target": (s != "reno-stars")}
for s in body["only_slugs"]
],
}
aggregate = prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": False,
"confirm": True,
"max_stragglers": 1,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["reno-stars", "agents-team", "hongming"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
assert aggregate["ok"] is True
assert aggregate["stragglers"] == ["reno-stars"]
def test_scoped_rollout_fails_when_stragglers_exceed_tolerance():
# Two tenants never verify; with max_stragglers=1 that is systemic → fail.
def fake_redeploy(_cp_url, _token, body):
return 200, {
"ok": True,
"results": [
{"slug": s, "verified_on_target": (s == "hongming")}
for s in body["only_slugs"]
],
}
try:
prod.execute_scoped_rollout(
{
"cp_url": "https://api.moleculesai.app",
"body": {
"target_tag": "staging-new",
"batch_size": 5,
"dry_run": False,
"confirm": True,
"max_stragglers": 1,
},
},
token="secret",
list_slugs=lambda _u, _t, _b: ["reno-stars", "agents-team", "hongming"],
redeploy=fake_redeploy,
sleep=lambda _s: None,
)
raise AssertionError("expected RolloutFailed when stragglers exceed tolerance")
except prod.RolloutFailed as exc:
assert "max tolerated 1" in str(exc)
def test_scoped_rollout_dry_run_does_not_assert_coverage():
# A dry run proves nothing landed; coverage must NOT be asserted or
# every plan would fail.
+6 -5
View File
@@ -291,7 +291,8 @@ class TestComputeAckState(unittest.TestCase):
)
self.assertEqual(state["comprehensive-testing"]["ackers"], ["bob"])
def test_self_ack_rejected(self):
def test_self_ack_rejected_when_author_in_team(self):
# Author self-acks are forbidden — a non-author peer must ack.
comments = [_comment("alice", "/sop-ack comprehensive-testing")]
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, self._approve_all
@@ -722,16 +723,16 @@ class TestRootCauseAckEligibilityWidened(unittest.TestCase):
)
self.assertEqual(state["root-cause"]["ackers"], ["hongming"])
def test_self_ack_still_forbidden_even_with_widened_eligibility(self):
# Author cannot self-ack — widening teams must NOT weaken
# the non-author rule.
def test_self_ack_rejected_with_widened_eligibility(self):
# Author self-acks are forbidden even when the author is in the
# required team — a non-author peer must ack.
comments = [_comment("alice", "/sop-ack root-cause")]
probe = self._approve_only({"alice"})
state = sop.compute_ack_state(
comments, "alice", self.items, self.aliases, probe, high_risk=False
)
self.assertEqual(state["root-cause"]["ackers"], [])
self.assertIn("alice", state["root-cause"]["rejected"]["self_ack"])
self.assertEqual(state["root-cause"]["rejected"]["self_ack"], ["alice"])
class TestHighRiskClassUsesElevatedListInConfig(unittest.TestCase):
+26 -2
View File
@@ -165,6 +165,28 @@ jobs:
cache: 'npm'
cache-dependency-path: canvas/package-lock.json
- name: Sweep stale e2e-chat testcontainers (self-heal prior leaks)
if: needs.detect-changes.outputs.chat == 'true'
run: |
# Prior e2e-chat runs that were cancelled/killed — or whose always()
# cleanup hit a wedged docker daemon — leak their pg-/redis-e2e-chat-*
# containers, which then pile up on the shared runner host (observed: 13
# such containers, up to 2 weeks old, on the operator daemon). Reap any
# e2e-chat container older than the job window so leaks self-heal every
# run instead of relying on each run's own cleanup succeeding. Age-based
# (>2h, well beyond the 15m job) so a CONCURRENT e2e-chat job's fresh
# containers are never touched. See controlplane#646.
now=$(date -u +%s)
docker ps -a --filter name=e2e-chat --format '{{.Names}}' | while read -r c; do
[ -n "$c" ] || continue
created=$(docker inspect -f '{{.Created}}' "$c" 2>/dev/null) || continue
cts=$(date -u -d "$created" +%s 2>/dev/null) || continue
if [ $(( now - cts )) -gt 7200 ]; then
echo "sweeping stale e2e-chat container $c (created $created)"
timeout 30 docker rm -f "$c" >/dev/null 2>&1 || true
fi
done
- name: Start Postgres (docker)
if: needs.detect-changes.outputs.chat == 'true'
run: |
@@ -430,5 +452,7 @@ jobs:
- name: Stop service containers
if: always() && needs.detect-changes.outputs.chat == 'true'
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
# timeout-wrap so a wedged docker daemon can't hang this always() step
# (a hung rm here is one way containers leak in the first place).
timeout 30 docker rm -f "$PG_CONTAINER" 2>/dev/null || true
timeout 30 docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
+92 -5
View File
@@ -78,6 +78,12 @@ jobs:
# even if the runner's $GITHUB_ENV propagation is flaky (#2468 RCA).
MOLECULE_ENV: development
SECRETS_ENCRYPTION_KEY: lpe2e-test-encryption-key-32bytes!!
# act_runner runs the job inside a Docker container, so /.dockerenv exists
# and the platform auto-detects platformInDocker=true. But the job container
# is NOT on molecule-core-net, so it cannot resolve workspace container
# hostnames (ws-<id>:8000). Force false so the proxy keeps using the
# host-mapped 127.0.0.1:<ephemeral_port> URL, which IS reachable.
MOLECULE_IN_DOCKER: false
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
@@ -132,7 +138,29 @@ jobs:
# jobs or stale processes from prior cancelled runs (see #2450).
PORT=$(python3 -c "import socket; s=socket.socket(); s.bind(('', 0)); print(s.getsockname()[1]); s.close()")
echo "PORT=${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://localhost:${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://127.0.0.1:${PORT}" >> "$GITHUB_ENV"
# Discover an IP that Docker containers can use to reach the host platform.
# host.docker.internal is not reliably available on Linux (act_runner), so
# workspace containers cannot resolve it and fail to register/heartbeat.
# Workspace containers join molecule-core-net; the host is reachable via that
# network's gateway. Ensure the network exists first (the provisioner creates
# it lazily, but we need the gateway BEFORE starting the platform).
docker network inspect molecule-core-net >/dev/null 2>&1 || docker network create molecule-core-net >/dev/null
# Parse Gateway from raw JSON because --format '{{.IPAM.Config}}' is
# inconsistent across Docker versions (sometimes omits Gateway field).
PLATFORM_HOST_IP=$(docker network inspect molecule-core-net 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(docker network inspect bridge 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(ip route | awk '/default/ {print $3}' | head -1 || true)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
echo "::error::Could not determine PLATFORM_HOST_IP for Docker containers to reach the platform"
exit 1
fi
echo "PLATFORM_HOST_IP=${PLATFORM_HOST_IP}"
echo "PLATFORM_URL=http://${PLATFORM_HOST_IP}:${PORT}" >> "$GITHUB_ENV"
# Deterministic admin token: the script sends MOLECULE_ADMIN_TOKEN as the
# bearer; the platform checks ADMIN_TOKEN. Set both to the same value.
T="lpe2e-admin-${{ github.run_id }}-${{ github.run_attempt }}"
@@ -173,8 +201,10 @@ jobs:
run: |
# Bind to the dynamically allocated port (see #2450).
# DATABASE_URL/REDIS_URL/ADMIN_TOKEN/MOLECULE_ENV are inherited from
# $GITHUB_ENV.
PORT=$PORT ./platform-server > platform.log 2>&1 &
# $GITHUB_ENV. PLATFORM_URL is also passed explicitly because
# $GITHUB_ENV propagation can be flaky on act_runner (#2468 RCA).
echo "starting platform with PLATFORM_URL=${PLATFORM_URL:-<fallback>} PORT=$PORT BIND_ADDR=0.0.0.0"
PORT=$PORT BIND_ADDR=0.0.0.0 PLATFORM_URL="${PLATFORM_URL:-http://host.docker.internal:$PORT}" ./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health (+ migrations applied)
@@ -198,6 +228,11 @@ jobs:
sleep 1
done
- name: Verify platform reachable from molecule-core-net
run: |
echo "Testing platform reachability from molecule-core-net container..."
docker run --rm --network molecule-core-net alpine:latest sh -c "wget -qO- http://${PLATFORM_URL#http://}/health" || echo "WARN: platform not reachable from molecule-core-net"
- name: Run local-provision lifecycle E2E (stub — REQUIRED)
run: bash tests/e2e/test_local_provision_lifecycle_e2e.sh
@@ -205,6 +240,15 @@ jobs:
if: failure()
run: cat workspace-server/platform.log || true
- name: Dump workspace container logs on failure
if: failure()
run: |
WS_NAME=$(docker ps --filter "name=ws-" --format '{{.Names}}' | head -1 || true)
if [ -n "$WS_NAME" ]; then
echo "=== Workspace container logs for $WS_NAME ==="
docker logs "$WS_NAME" 2>&1 | tail -n 80 || true
fi
- name: Stop platform
if: always()
run: |
@@ -248,6 +292,12 @@ jobs:
# even if the runner's $GITHUB_ENV propagation is flaky (#2468 RCA).
MOLECULE_ENV: development
SECRETS_ENCRYPTION_KEY: lpe2e-test-encryption-key-32bytes!!
# act_runner runs the job inside a Docker container, so /.dockerenv exists
# and the platform auto-detects platformInDocker=true. But the job container
# is NOT on molecule-core-net, so it cannot resolve workspace container
# hostnames (ws-<id>:8000). Force false so the proxy keeps using the
# host-mapped 127.0.0.1:<ephemeral_port> URL, which IS reachable.
MOLECULE_IN_DOCKER: false
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
@@ -297,7 +347,29 @@ jobs:
# jobs or stale processes from prior cancelled runs (see #2450).
PORT=$(python3 -c "import socket; s=socket.socket(); s.bind(('', 0)); print(s.getsockname()[1]); s.close()")
echo "PORT=${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://localhost:${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://127.0.0.1:${PORT}" >> "$GITHUB_ENV"
# Discover an IP that Docker containers can use to reach the host platform.
# host.docker.internal is not reliably available on Linux (act_runner), so
# workspace containers cannot resolve it and fail to register/heartbeat.
# Workspace containers join molecule-core-net; the host is reachable via that
# network's gateway. Ensure the network exists first (the provisioner creates
# it lazily, but we need the gateway BEFORE starting the platform).
docker network inspect molecule-core-net >/dev/null 2>&1 || docker network create molecule-core-net >/dev/null
# Parse Gateway from raw JSON because --format '{{.IPAM.Config}}' is
# inconsistent across Docker versions (sometimes omits Gateway field).
PLATFORM_HOST_IP=$(docker network inspect molecule-core-net 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(docker network inspect bridge 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(ip route | awk '/default/ {print $3}' | head -1 || true)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
echo "::error::Could not determine PLATFORM_HOST_IP for Docker containers to reach the platform"
exit 1
fi
echo "PLATFORM_HOST_IP=${PLATFORM_HOST_IP}"
echo "PLATFORM_URL=http://${PLATFORM_HOST_IP}:${PORT}" >> "$GITHUB_ENV"
T="lpe2e-real-admin-${{ github.run_id }}-${{ github.run_attempt }}"
echo "ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
echo "MOLECULE_ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
@@ -329,7 +401,8 @@ jobs:
- name: Start platform (background)
working-directory: workspace-server
run: |
PORT=$PORT ./platform-server > platform.log 2>&1 &
echo "starting platform with PLATFORM_URL=${PLATFORM_URL:-<fallback>} PORT=$PORT BIND_ADDR=0.0.0.0"
PORT=$PORT BIND_ADDR=0.0.0.0 PLATFORM_URL="${PLATFORM_URL:-http://host.docker.internal:$PORT}" ./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health (+ migrations applied)
@@ -351,6 +424,11 @@ jobs:
sleep 1
done
- name: Verify platform reachable from molecule-core-net
run: |
echo "Testing platform reachability from molecule-core-net container..."
docker run --rm --network molecule-core-net alpine:latest sh -c "wget -qO- http://${PLATFORM_URL#http://}/health" || echo "WARN: platform not reachable from molecule-core-net"
- name: Run local-provision lifecycle E2E (real image + MiniMax LLM — ADVISORY)
env:
# LIFECYCLE_LLM=minimax: provision the REAL claude-code template image
@@ -375,6 +453,15 @@ jobs:
if: failure()
run: cat workspace-server/platform.log || true
- name: Dump workspace container logs on failure
if: failure()
run: |
WS_NAME=$(docker ps --filter "name=ws-" --format '{{.Names}}' | head -1 || true)
if [ -n "$WS_NAME" ]; then
echo "=== Workspace container logs for $WS_NAME ==="
docker logs "$WS_NAME" 2>&1 | tail -n 80 || true
fi
- name: Stop platform
if: always()
run: |
@@ -530,7 +530,20 @@ jobs:
STALE_COUNT=0
UNREACHABLE_COUNT=0
UNHEALTHY_COUNT=0
QUARANTINED_COUNT=0
# Quarantined stragglers: the CP shipped the build to the healthy
# majority and quarantined a small minority within tolerance
# (max_stragglers). They are reported + recovered SEPARATELY, so they
# must not red the strict per-tenant verify — otherwise one stuck
# tenant blocks the whole deploy, the all-or-nothing trap this fixes.
STRAGGLERS_LIST="$(jq -r '(.stragglers // [])[]' "$RESP" 2>/dev/null || true)"
is_straggler() { printf '%s\n' "$STRAGGLERS_LIST" | grep -qxF "$1"; }
for slug in "${SLUGS[@]}"; do
if is_straggler "$slug"; then
echo "::warning::$slug is a QUARANTINED straggler — build shipped to the rest of the fleet; this tenant needs individual recovery. Skipping strict verify."
QUARANTINED_COUNT=$((QUARANTINED_COUNT + 1))
continue
fi
healthz_ok="$(jq -r --arg slug "$slug" '.results[]? | select(.slug == $slug) | .healthz_ok' "$RESP" | tail -1)"
if [ "$healthz_ok" != "true" ]; then
echo "::error::$slug did not report healthz_ok=true in redeploy-fleet response."
@@ -580,6 +593,7 @@ jobs:
echo "Stale tenants: $STALE_COUNT"
echo "Unhealthy tenants: $UNHEALTHY_COUNT"
echo "Unreachable tenants: $UNREACHABLE_COUNT"
echo "Quarantined stragglers (shipped past; need recovery): $QUARANTINED_COUNT"
} >> "$GITHUB_STEP_SUMMARY"
if [ "$STALE_COUNT" -gt 0 ] || [ "$UNHEALTHY_COUNT" -gt 0 ] || [ "$UNREACHABLE_COUNT" -gt 0 ]; then
@@ -7,29 +7,44 @@ import { isSaaSTenant } from "@/lib/tenant";
import { useCanvasStore, type WorkspaceNodeData } from "@/store/canvas";
import type { WorkspaceCompute } from "@/store/socket";
// Machine sizes keyed by cloud provider — an AWS t3.* is meaningless on Hetzner,
// etc. MUST mirror the workspace-server workspaceComputeInstanceAllowlist (which
// mirrors the CP provider configs); the PATCH validation rejects a mismatch 400.
const INSTANCE_TYPES_BY_PROVIDER: Record<string, string[]> = {
aws: ["t3.medium", "t3.large", "t3.xlarge", "t3.2xlarge", "m6i.large", "m6i.xlarge", "c6i.xlarge"],
hetzner: ["cpx11", "cpx21", "cpx31", "cpx41", "cpx51", "cax11", "cax21", "cax31", "cax41"],
gcp: ["e2-small", "e2-medium", "e2-standard-2", "e2-standard-4", "e2-standard-8"],
// Cloud-provider + instance-type metadata (core#2489).
//
// SSOT lives in the workspace-server (workspace_compute.go's allowlist + defaults)
// and is fetched at runtime from GET /workspaces/:id/compute-options, so the UI
// can never offer a (provider, instance-type) the PATCH validation then rejects
// with a 400. The constants below are ONLY a minimal offline fallback used until
// the fetch resolves (or if it fails) — they mirror the server SSOT but are not
// the source of truth. When the fetch succeeds, its data replaces them entirely.
type ComputeOptions = {
providers: string[];
instanceTypes: Record<string, string[]>;
defaults: Record<string, string>;
};
const DEFAULT_INSTANCE_BY_PROVIDER: Record<string, string> = {
aws: "t3.medium", hetzner: "cpx31", gcp: "e2-standard-2",
};
const normalizeProvider = (p?: string): string => (p === "gcp" || p === "hetzner" ? p : "aws");
const instanceTypesForProvider = (p?: string): string[] =>
INSTANCE_TYPES_BY_PROVIDER[normalizeProvider(p)] ?? INSTANCE_TYPES_BY_PROVIDER.aws;
const defaultInstanceForProvider = (p?: string): string =>
DEFAULT_INSTANCE_BY_PROVIDER[normalizeProvider(p)] ?? "t3.medium";
// Editable cloud-provider options (multi-provider RFC) — mirrors CreateWorkspaceDialog.
const CLOUD_PROVIDER_OPTIONS = [
{ value: "aws", label: "AWS (default)" },
{ value: "gcp", label: "GCP" },
{ value: "hetzner", label: "Hetzner" },
];
const FALLBACK_COMPUTE_OPTIONS: ComputeOptions = {
providers: ["aws", "hetzner", "gcp"],
instanceTypes: {
aws: ["t3.medium", "t3.large", "t3.xlarge", "t3.2xlarge", "m6i.large", "m6i.xlarge", "c6i.xlarge"],
hetzner: ["cpx11", "cpx21", "cpx31", "cpx41", "cpx51", "cax11", "cax21", "cax31", "cax41"],
gcp: ["e2-small", "e2-medium", "e2-standard-2", "e2-standard-4", "e2-standard-8"],
},
defaults: { aws: "t3.medium", hetzner: "cpx31", gcp: "e2-standard-2" },
};
const normalizeProvider = (p?: string): string => (p === "gcp" || p === "hetzner" ? p : "aws");
const instanceTypesForProvider = (opts: ComputeOptions, p?: string): string[] =>
opts.instanceTypes[normalizeProvider(p)] ?? opts.instanceTypes.aws ?? FALLBACK_COMPUTE_OPTIONS.instanceTypes.aws;
const defaultInstanceForProvider = (opts: ComputeOptions, p?: string): string =>
opts.defaults[normalizeProvider(p)] ?? "t3.medium";
// Human labels for the cloud-provider selector. The option VALUES come from the
// fetched SSOT (opts.providers); this only supplies display text + the default tag.
const CLOUD_PROVIDER_LABELS: Record<string, string> = {
aws: "AWS (default)",
gcp: "GCP",
hetzner: "Hetzner",
};
const cloudProviderOptionLabel = (v: string): string => CLOUD_PROVIDER_LABELS[v] ?? v;
const RUNTIME_OPTIONS = ["claude-code", "codex", "hermes", "openclaw", "kimi", "kimi-cli", "external"];
const RESOLUTIONS = ["1280x720", "1440x900", "1920x1080", "2560x1440"];
@@ -87,6 +102,12 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
const [saving, setSaving] = useState(false);
const [error, setError] = useState<string | null>(null);
const [success, setSuccess] = useState(false);
// core#2489: provider + instance-type dropdowns are populated from the
// workspace-server SSOT (GET /workspaces/:id/compute-options) so they can't
// drift from what the PATCH validation accepts. Start from the offline fallback
// and replace it once the fetch resolves; on fetch error we keep the fallback
// (the dropdowns still work, just from the in-bundle mirror).
const [computeOptions, setComputeOptions] = useState<ComputeOptions>(FALLBACK_COMPUTE_OPTIONS);
useEffect(() => {
setForm(initial);
@@ -94,6 +115,30 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
setSuccess(false);
}, [initial]);
useEffect(() => {
let cancelled = false;
(async () => {
try {
const opts = await api.get<Partial<ComputeOptions>>(`/workspaces/${workspaceId}/compute-options`);
if (cancelled) return;
// Defensive: only adopt a well-formed payload; otherwise keep the fallback.
if (opts && Array.isArray(opts.providers) && opts.providers.length > 0 && opts.instanceTypes && opts.defaults) {
setComputeOptions({
providers: opts.providers,
instanceTypes: opts.instanceTypes,
defaults: opts.defaults,
});
}
} catch {
// Fetch failed (offline / older server) — keep FALLBACK_COMPUTE_OPTIONS.
// The dropdowns stay usable; worst case they show the in-bundle mirror.
}
})();
return () => {
cancelled = true;
};
}, [workspaceId]);
const workspaceAccess = formatAccess(data.workspaceAccess);
const maxConcurrentTasks = data.maxConcurrentTasks ? String(data.maxConcurrentTasks) : "platform-managed";
const deliveryMode = data.deliveryMode || "push";
@@ -208,8 +253,8 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
id="cloud-provider"
label="Cloud provider"
value={normalizeProvider(form.provider)}
options={CLOUD_PROVIDER_OPTIONS.map((p) => p.value)}
optionLabel={(v) => CLOUD_PROVIDER_OPTIONS.find((p) => p.value === v)?.label ?? v}
options={computeOptions.providers}
optionLabel={cloudProviderOptionLabel}
// Switching cloud resets the instance type to the new provider's
// default (an AWS t3.* is invalid on Hetzner, etc.) — also keeps the
// instance-type dropdown below in sync with the provider's sizes.
@@ -217,9 +262,9 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
setForm((s) => ({
...s,
provider,
instanceType: instanceTypesForProvider(provider).includes(s.instanceType)
instanceType: instanceTypesForProvider(computeOptions, provider).includes(s.instanceType)
? s.instanceType
: defaultInstanceForProvider(provider),
: defaultInstanceForProvider(computeOptions, provider),
}))
}
/>
@@ -228,7 +273,7 @@ export function ContainerConfigTab({ workspaceId, data }: Props) {
id="instance-type"
label="Instance type"
value={form.instanceType}
options={instanceTypesForProvider(form.provider)}
options={instanceTypesForProvider(computeOptions, form.provider)}
onChange={(instanceType) => setForm((s) => ({ ...s, instanceType }))}
/>
<label className="grid gap-1" htmlFor="root-volume-gb">
@@ -348,7 +393,10 @@ function formFromData(data: {
return {
runtime: data.runtime || "claude-code",
provider,
instanceType: data.instanceType || defaultInstanceForProvider(provider),
// Falls back to the offline default only when no instance type is persisted;
// the server SSOT default matches FALLBACK_COMPUTE_OPTIONS, and the dropdown
// re-syncs to the fetched options once they resolve.
instanceType: data.instanceType || defaultInstanceForProvider(FALLBACK_COMPUTE_OPTIONS, provider),
rootGB: String(data.rootGB || DEFAULT_HEADLESS_ROOT_GB),
displayEnabled: !!data.displayMode && data.displayMode !== "none",
displayMode: data.displayMode && data.displayMode !== "none" ? data.displayMode : "desktop-control",
@@ -3,12 +3,14 @@ import { cleanup, fireEvent, render, screen, waitFor } from "@testing-library/re
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
const apiPatch = vi.fn();
const apiGet = vi.fn();
const updateNodeData = vi.fn();
const restartWorkspace = vi.fn();
vi.mock("@/lib/api", () => ({
api: {
patch: (path: string, body: unknown) => apiPatch(path, body),
get: (path: string) => apiGet(path),
},
}));
@@ -38,6 +40,12 @@ afterEach(() => {
beforeEach(() => {
apiPatch.mockReset();
apiGet.mockReset();
// Default: compute-options fetch rejects → component keeps its in-bundle
// fallback SSOT. Existing assertions (t3.medium / cpx31 / provider list) are
// satisfied by the fallback, which mirrors the server. Individual tests that
// exercise the fetch path override this with mockResolvedValueOnce.
apiGet.mockRejectedValue(new Error("no compute-options in this test"));
restartWorkspace.mockReset();
updateNodeData.mockReset();
});
@@ -358,6 +366,76 @@ describe("ContainerConfigTab", () => {
confirmSpy.mockRestore();
});
// core#2489: the provider + instance-type dropdowns are populated from the
// workspace-server SSOT (GET /workspaces/:id/compute-options), so the UI can't
// offer an option the backend then rejects. This proves the fetch drives the
// dropdowns: a server-only instance type appears once the fetch resolves.
it("populates instance-type options from the compute-options SSOT endpoint", async () => {
apiGet.mockResolvedValueOnce({
providers: ["aws", "hetzner", "gcp"],
instanceTypes: {
aws: ["t3.medium", "t3.large", "z9.future"], // z9.future is server-only
hetzner: ["cpx31"],
gcp: ["e2-standard-2"],
},
defaults: { aws: "t3.medium", hetzner: "cpx31", gcp: "e2-standard-2" },
});
render(
<ContainerConfigTab
workspaceId="ws-opts"
data={{
runtime: "claude-code",
status: "online",
needsRestart: false,
activeTasks: 0,
maxConcurrentTasks: null,
workspaceAccess: "none",
deliveryMode: "push",
compute: { instance_type: "t3.large", provider: "aws", volume: { root_gb: 30 } },
}}
/>,
);
await waitFor(() => expect(apiGet).toHaveBeenCalledWith("/workspaces/ws-opts/compute-options"));
// The server-only instance type appears in the dropdown after the fetch.
await waitFor(() =>
expect(
Array.from(screen.getByLabelText("Instance type").querySelectorAll("option")).map((o) => o.getAttribute("value")),
).toContain("z9.future"),
);
});
// core#2489: if the compute-options fetch fails, the dropdowns must stay usable
// via the in-bundle fallback (no crash, no empty selector).
it("falls back to the in-bundle option set when the compute-options fetch fails", async () => {
apiGet.mockRejectedValueOnce(new Error("network down"));
render(
<ContainerConfigTab
workspaceId="ws-opts"
data={{
runtime: "claude-code",
status: "online",
needsRestart: false,
activeTasks: 0,
maxConcurrentTasks: null,
workspaceAccess: "none",
deliveryMode: "push",
compute: { instance_type: "t3.large", provider: "aws", volume: { root_gb: 30 } },
}}
/>,
);
await waitFor(() => expect(apiGet).toHaveBeenCalled());
// Fallback list still renders the known AWS sizes.
const values = Array.from(
screen.getByLabelText("Instance type").querySelectorAll("option"),
).map((o) => o.getAttribute("value"));
expect(values).toContain("t3.medium");
expect(values).toContain("m6i.xlarge");
});
it("does not treat a non-provider edit as a recreate (no confirm; aws default omitted)", async () => {
const confirmSpy = vi.spyOn(window, "confirm").mockReturnValue(true);
render(
@@ -272,20 +272,6 @@ func MarkQueueItemFailed(ctx context.Context, id, errMsg string) {
}
}
// QueueDepth returns the number of currently-queued (not dispatched/completed)
// items for a workspace. Used by the busy-return response body so callers
// can see how many ahead of them.
func QueueDepth(ctx context.Context, workspaceID string) int {
var n int
if err := db.DB.QueryRowContext(ctx,
`SELECT COUNT(*) FROM a2a_queue WHERE workspace_id = $1 AND status = 'queued'`,
workspaceID,
).Scan(&n); err != nil {
log.Printf("A2AQueue: QueueDepth query failed for workspace %s: %v", workspaceID, err)
}
return n
}
// DropStaleQueueItems marks queued items older than maxAge as 'dropped' with a
// system-generated reason so PM agents stop processing stale post-incident noise.
// Called with a workspaceID to scope cleanup to one workspace, or empty to sweep
@@ -31,28 +31,72 @@ type workspaceDisplayResponse struct {
Status string `json:"status,omitempty"`
}
// workspaceComputeInstanceAllowlist is keyed by cloud provider (multi-provider /
// in-place switch): each provider's box accepts only that provider's machine
// sizes (an AWS t3.* is meaningless on Hetzner, and vice-versa). Mirrors the CP
// provider SSOT — keep in lock-step with the controlplane provider configs
// (Hetzner ServerType cpx*/cax*, GCP MachineType e2-*, AWS EC2 t3*/m6i*/c6i*).
// TestValidateWorkspaceCompute_Provider / _InstanceTypePerProvider pin the sets.
// "" provider = AWS default.
var workspaceComputeInstanceAllowlist = map[string]map[string]struct{}{
// SSOT for cloud-provider + instance-type metadata (core#2489).
//
// This file is the SINGLE source of truth the workspace-server validates
// against AND the canvas Container-Config tab renders its dropdowns from (via
// GET /workspaces/:id/compute-options, see ComputeOptions below). Previously the
// canvas hardcoded a parallel copy of these lists in ContainerConfigTab.tsx; the
// two could drift so the UI offered a (provider, instance-type) the backend
// allowlist then rejected with a 400. The canvas now derives its options from
// this endpoint, so drift is impossible by construction.
//
// The ordered slices below are the canonical form. workspaceComputeInstanceAllowlist
// (the O(1) validation set) is DERIVED from them in init(), so the ordered list
// the canvas renders and the set the backend validates can never disagree.
//
// Mirrors the CP provider SSOT — keep in lock-step with the controlplane provider
// configs (Hetzner ServerType cpx*/cax*, GCP MachineType e2-*, AWS EC2
// t3*/m6i*/c6i*). TestValidateWorkspaceCompute_Provider / _InstanceTypePerProvider
// pin the sets. "" provider = AWS default.
// workspaceComputeProvidersOrdered is the canonical provider order (AWS first =
// default). The canvas renders the provider dropdown in this order.
var workspaceComputeProvidersOrdered = []string{"aws", "hetzner", "gcp"}
// workspaceComputeInstanceTypesOrdered lists each provider's machine sizes in the
// order the canvas should render them. An AWS t3.* is meaningless on Hetzner, and
// vice-versa, so the set is provider-scoped.
var workspaceComputeInstanceTypesOrdered = map[string][]string{
"aws": {
"t3.medium": {}, "t3.large": {}, "t3.xlarge": {}, "t3.2xlarge": {},
"m6i.large": {}, "m6i.xlarge": {}, "c6i.xlarge": {},
"t3.medium", "t3.large", "t3.xlarge", "t3.2xlarge",
"m6i.large", "m6i.xlarge", "c6i.xlarge",
},
"hetzner": {
"cpx11": {}, "cpx21": {}, "cpx31": {}, "cpx41": {}, "cpx51": {},
"cax11": {}, "cax21": {}, "cax31": {}, "cax41": {},
"cpx11", "cpx21", "cpx31", "cpx41", "cpx51",
"cax11", "cax21", "cax31", "cax41",
},
"gcp": {
"e2-small": {}, "e2-medium": {},
"e2-standard-2": {}, "e2-standard-4": {}, "e2-standard-8": {},
"e2-small", "e2-medium",
"e2-standard-2", "e2-standard-4", "e2-standard-8",
},
}
// workspaceComputeDefaultInstanceByProvider is the per-provider default machine
// size the canvas pre-selects when switching providers (an AWS t3.* is invalid on
// Hetzner, so the switch resets to the new provider's default).
var workspaceComputeDefaultInstanceByProvider = map[string]string{
"aws": "t3.medium",
"hetzner": "cpx31",
"gcp": "e2-standard-2",
}
// workspaceComputeInstanceAllowlist is the O(1) validation set, keyed by cloud
// provider. DERIVED from workspaceComputeInstanceTypesOrdered in init() so the
// ordered list (what the canvas renders) and the set (what the backend validates)
// stay in lock-step — you cannot add an instance type to one without the other.
var workspaceComputeInstanceAllowlist = map[string]map[string]struct{}{}
func init() {
for provider, types := range workspaceComputeInstanceTypesOrdered {
set := make(map[string]struct{}, len(types))
for _, t := range types {
set[t] = struct{}{}
}
workspaceComputeInstanceAllowlist[provider] = set
}
}
// normalizeCloudProvider maps "" → "aws" so the in-place switch comparison
// treats the default and an explicit "aws" as the same cloud (no spurious switch).
func normalizeCloudProvider(p string) string {
@@ -88,10 +132,15 @@ func instanceTypeAllowedForProvider(provider, instanceType string) bool {
// change here (and the CP itself fail-closes an unwired provider with a 422).
// "" = default (AWS) and is always accepted. This is the gate the switch-provider
// flow reuses to reject a bad provider with a clean 400 before any CP round-trip.
var workspaceComputeProviderAllowlist = map[string]struct{}{
"aws": {},
"gcp": {},
"hetzner": {},
// DERIVED from workspaceComputeProvidersOrdered (the SSOT, core#2489) in init() so
// the set the backend validates and the ordered list the canvas renders cannot
// drift.
var workspaceComputeProviderAllowlist = map[string]struct{}{}
func init() {
for _, p := range workspaceComputeProvidersOrdered {
workspaceComputeProviderAllowlist[p] = struct{}{}
}
}
func validateWorkspaceCompute(compute models.WorkspaceCompute) error {
@@ -262,6 +311,55 @@ func withStoredCompute(ctx context.Context, workspaceID string, payload models.C
return payload
}
// workspaceComputeOptionsResponse is the SSOT payload the canvas Container-Config
// tab consumes to populate its provider + instance-type dropdowns (core#2489).
// It is derived entirely from the allowlist + defaults in this file, so the UI
// can never offer a (provider, instance-type) the backend then rejects.
type workspaceComputeOptionsResponse struct {
// Providers in canonical render order (AWS first = default).
Providers []string `json:"providers"`
// InstanceTypes per provider, in canonical render order.
InstanceTypes map[string][]string `json:"instanceTypes"`
// Defaults maps each provider → its default instance type (the canvas
// pre-selects this when switching providers).
Defaults map[string]string `json:"defaults"`
}
// buildComputeOptions assembles the SSOT response from the allowlist + defaults.
// Pure (no DB / no gin) so it can be unit-tested directly and reused.
func buildComputeOptions() workspaceComputeOptionsResponse {
providers := make([]string, len(workspaceComputeProvidersOrdered))
copy(providers, workspaceComputeProvidersOrdered)
instanceTypes := make(map[string][]string, len(workspaceComputeInstanceTypesOrdered))
for _, p := range providers {
src := workspaceComputeInstanceTypesOrdered[p]
dst := make([]string, len(src))
copy(dst, src)
instanceTypes[p] = dst
}
defaults := make(map[string]string, len(workspaceComputeDefaultInstanceByProvider))
for k, v := range workspaceComputeDefaultInstanceByProvider {
defaults[k] = v
}
return workspaceComputeOptionsResponse{
Providers: providers,
InstanceTypes: instanceTypes,
Defaults: defaults,
}
}
// ComputeOptions handles GET /workspaces/:id/compute-options. It returns the
// cloud-provider + instance-type metadata the canvas Container-Config tab renders
// its dropdowns from — the SAME data validateWorkspaceCompute enforces (core#2489).
// Static (derived from the in-binary allowlist), so it needs no DB round-trip; the
// :id is scoped only by the WorkspaceAuth middleware on the route group.
func (h *WorkspaceHandler) ComputeOptions(c *gin.Context) {
c.JSON(200, buildComputeOptions())
}
// Display handles GET /workspaces/:id/display.
func (h *WorkspaceHandler) Display(c *gin.Context) {
workspaceID := c.Param("id")
@@ -375,6 +375,103 @@ func TestWithStoredCompute_LoadsComputeForRestartPayloads(t *testing.T) {
}
}
// core#2489: the allowlist (validation set) MUST be derived from the ordered
// lists the canvas renders, so the UI and the backend can never disagree about
// which (provider, instance-type) pairs are valid. This pins that the derived
// set exactly matches the ordered source — adding to one without the other fails.
func TestComputeOptions_AllowlistDerivedFromOrderedSSOT(t *testing.T) {
// Every ordered instance type is in the validation set (and vice-versa).
for provider, types := range workspaceComputeInstanceTypesOrdered {
set, ok := workspaceComputeInstanceAllowlist[provider]
if !ok {
t.Fatalf("allowlist missing provider %q present in ordered SSOT", provider)
}
if len(set) != len(types) {
t.Fatalf("provider %q: ordered list (%d) and allowlist set (%d) drifted", provider, len(types), len(set))
}
for _, it := range types {
if _, ok := set[it]; !ok {
t.Fatalf("provider %q: ordered instance %q missing from validation allowlist", provider, it)
}
}
}
// No extra providers in the set that aren't in the ordered list.
if len(workspaceComputeInstanceAllowlist) != len(workspaceComputeInstanceTypesOrdered) {
t.Fatalf("allowlist has providers not present in the ordered SSOT")
}
// Provider allowlist derived from the ordered providers.
if len(workspaceComputeProviderAllowlist) != len(workspaceComputeProvidersOrdered) {
t.Fatalf("provider allowlist (%d) drifted from ordered providers (%d)", len(workspaceComputeProviderAllowlist), len(workspaceComputeProvidersOrdered))
}
for _, p := range workspaceComputeProvidersOrdered {
if _, ok := workspaceComputeProviderAllowlist[p]; !ok {
t.Fatalf("provider allowlist missing ordered provider %q", p)
}
}
}
// core#2489: the per-provider defaults the canvas pre-selects on a provider switch
// MUST themselves be valid instance types for that provider — otherwise the switch
// produces a PATCH the backend immediately rejects.
func TestComputeOptions_DefaultsAreValidForTheirProvider(t *testing.T) {
for provider, def := range workspaceComputeDefaultInstanceByProvider {
if !instanceTypeAllowedForProvider(provider, def) {
t.Errorf("default instance %q for provider %q is not in that provider's allowlist", def, provider)
}
}
// Every provider must have a default (so the switch never lands on "").
for _, p := range workspaceComputeProvidersOrdered {
if workspaceComputeDefaultInstanceByProvider[p] == "" {
t.Errorf("provider %q has no default instance type", p)
}
}
}
// core#2489: the GET /compute-options endpoint returns exactly the SSOT data the
// canvas renders dropdowns from. Every (provider, instance-type) it advertises
// MUST pass validateWorkspaceCompute — the whole point of the consolidation.
func TestWorkspaceComputeOptions_ReturnsSSOTAndEveryOptionValidates(t *testing.T) {
handler := NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "ws-opts"}}
c.Request = httptest.NewRequest("GET", "/workspaces/ws-opts/compute-options", nil)
handler.ComputeOptions(c)
if w.Code != http.StatusOK {
t.Fatalf("expected status 200, got %d: %s", w.Code, w.Body.String())
}
var resp workspaceComputeOptionsResponse
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("failed to parse compute-options response: %v", err)
}
// AWS first (default) in the provider order.
if len(resp.Providers) == 0 || resp.Providers[0] != "aws" {
t.Fatalf("providers = %v, want aws first", resp.Providers)
}
// Every advertised (provider, instance-type) must pass backend validation.
for _, provider := range resp.Providers {
types, ok := resp.InstanceTypes[provider]
if !ok || len(types) == 0 {
t.Fatalf("compute-options advertised provider %q with no instance types", provider)
}
for _, it := range types {
if !instanceTypeAllowedForProvider(provider, it) {
t.Errorf("compute-options advertised %q/%q which the backend rejects (DRIFT)", provider, it)
}
}
def := resp.Defaults[provider]
if def == "" {
t.Errorf("compute-options missing default for provider %q", provider)
} else if !instanceTypeAllowedForProvider(provider, def) {
t.Errorf("compute-options default %q for %q fails backend validation", def, provider)
}
}
}
func TestWorkspaceDisplay_NonDisplayWorkspaceReturnsUnavailable(t *testing.T) {
mock := setupTestDB(t)
setupTestRedis(t)
@@ -253,7 +253,7 @@ func TestStart_SendsTemplateAndGeneratedConfigFiles(t *testing.T) {
if err := os.WriteFile(filepath.Join(tmpl, "config.yaml"), []byte("name: template\n"), 0o600); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(tmpl, "adapter.py"), bytes.Repeat([]byte("x"), cpConfigFilesMaxBytes), 0o600); err != nil {
if err := os.WriteFile(filepath.Join(tmpl, "adapter.py"), bytes.Repeat([]byte("x"), cpConfigFilesMaxBytes-100), 0o600); err != nil {
t.Fatal(err)
}
if err := os.Mkdir(filepath.Join(tmpl, "prompts"), 0o700); err != nil {
@@ -378,7 +378,7 @@ func TestStart_CollectsConfigFiles(t *testing.T) {
}
// adapter.py is within the size limit but is NOT config.yaml or prompts/,
// so isCPTemplateConfigFile must exclude it from the transport.
if err := os.WriteFile(filepath.Join(tmpl, "adapter.py"), bytes.Repeat([]byte("x"), cpConfigFilesMaxBytes), 0o600); err != nil {
if err := os.WriteFile(filepath.Join(tmpl, "adapter.py"), bytes.Repeat([]byte("x"), cpConfigFilesMaxBytes-100), 0o600); err != nil {
t.Fatal(err)
}
@@ -198,6 +198,11 @@ const (
// ConfigVolumeName returns the Docker named volume for a workspace's configs.
func ConfigVolumeName(workspaceID string) string {
return fmt.Sprintf("ws-%s-configs", workspaceID)
}
// legacyConfigVolumeName returns the pre-KI-013 truncated config volume name.
func legacyConfigVolumeName(workspaceID string) string {
id := workspaceID
if len(id) > 12 {
id = id[:12]
@@ -210,6 +215,11 @@ func ConfigVolumeName(workspaceID string) string {
// config volume so it can be discarded independently (via WORKSPACE_RESET_SESSION
// or ?reset=true) without wiping the user's config. Issue #12.
func ClaudeSessionVolumeName(workspaceID string) string {
return fmt.Sprintf("ws-%s-claude-sessions", workspaceID)
}
// legacyClaudeSessionVolumeName returns the pre-KI-013 truncated session volume name.
func legacyClaudeSessionVolumeName(workspaceID string) string {
id := workspaceID
if len(id) > 12 {
id = id[:12]
@@ -233,6 +243,12 @@ func New() (*Provisioner, error) {
// ContainerName returns the Docker container name for a workspace.
func ContainerName(workspaceID string) string {
return fmt.Sprintf("ws-%s", workspaceID)
}
// legacyContainerName returns the pre-KI-013 truncated container name.
// Used only for backward-compatible lookups during the deploy transition.
func legacyContainerName(workspaceID string) string {
id := workspaceID
if len(id) > 12 {
id = id[:12]
@@ -474,7 +490,9 @@ func (p *Provisioner) Start(ctx context.Context, cfg WorkspaceConfig) (string, e
return "", ErrNoBackend
}
name := ContainerName(cfg.WorkspaceID)
configVolume := ConfigVolumeName(cfg.WorkspaceID)
// KI-013 deploy safety: prefer legacy truncated config volume if it
// already exists, so pre-deploy workspace data is not orphaned.
configVolume := p.resolveConfigVolumeName(ctx, cfg.WorkspaceID)
// Create named volume for configs (idempotent — no-op if already exists)
_, err := p.cli.VolumeCreate(ctx, volume.CreateOptions{
@@ -569,7 +587,9 @@ func (p *Provisioner) Start(ctx context.Context, cfg WorkspaceConfig) (string, e
// remove the existing volume before recreating it, so the agent
// boots with a clean session dir.
if cfg.Runtime == "claude-code" {
claudeSessionsVolume := ClaudeSessionVolumeName(cfg.WorkspaceID)
// KI-013 deploy safety: prefer legacy truncated session volume if it
// already exists, so pre-deploy session data is not orphaned.
claudeSessionsVolume := p.resolveClaudeSessionVolumeName(ctx, cfg.WorkspaceID)
resetEnv, _ := strconv.ParseBool(cfg.EnvVars["WORKSPACE_RESET_SESSION"])
if cfg.ResetClaudeSession || resetEnv {
if rmErr := p.cli.VolumeRemove(ctx, claudeSessionsVolume, true); rmErr != nil {
@@ -1288,7 +1308,7 @@ func (p *Provisioner) WriteAuthTokenToVolume(ctx context.Context, workspaceID, t
if p == nil || p.cli == nil {
return ErrNoBackend
}
volName := ConfigVolumeName(workspaceID)
volName := p.resolveConfigVolumeName(ctx, workspaceID)
resp, err := p.cli.ContainerCreate(ctx, &container.Config{
Image: "alpine",
Cmd: []string{"sh", "-c", writeAuthTokenVolumeCmd()},
@@ -1315,6 +1335,33 @@ func (p *Provisioner) WriteAuthTokenToVolume(ctx context.Context, workspaceID, t
return nil
}
// resolveConfigVolumeName returns the effective config volume name for a
// workspace, preferring the legacy truncated name if that volume already
// exists (KI-013 deploy safety: pre-deploy volumes must not be orphaned).
func (p *Provisioner) resolveConfigVolumeName(ctx context.Context, workspaceID string) string {
if p == nil || p.cli == nil {
return ConfigVolumeName(workspaceID)
}
legacy := legacyConfigVolumeName(workspaceID)
if _, err := p.cli.VolumeInspect(ctx, legacy); err == nil {
return legacy
}
return ConfigVolumeName(workspaceID)
}
// resolveClaudeSessionVolumeName returns the effective claude-sessions volume
// name, preferring the legacy truncated name if that volume already exists.
func (p *Provisioner) resolveClaudeSessionVolumeName(ctx context.Context, workspaceID string) string {
if p == nil || p.cli == nil {
return ClaudeSessionVolumeName(workspaceID)
}
legacy := legacyClaudeSessionVolumeName(workspaceID)
if _, err := p.cli.VolumeInspect(ctx, legacy); err == nil {
return legacy
}
return ClaudeSessionVolumeName(workspaceID)
}
// RemoveVolume removes the config volume for a workspace.
// Also removes the claude-sessions volume (best-effort, may not exist
// for non claude-code runtimes). Issue #12.
@@ -1322,16 +1369,22 @@ func (p *Provisioner) RemoveVolume(ctx context.Context, workspaceID string) erro
if p == nil || p.cli == nil {
return ErrNoBackend
}
volName := ConfigVolumeName(workspaceID)
if err := p.cli.VolumeRemove(ctx, volName, true); err != nil {
return fmt.Errorf("failed to remove volume %s: %w", volName, err)
// KI-013 deploy safety: remove both new full-ID name and legacy
// truncated name if present, so pre-deploy volumes are not orphaned.
removed := false
for _, volName := range []string{ConfigVolumeName(workspaceID), legacyConfigVolumeName(workspaceID)} {
if err := p.cli.VolumeRemove(ctx, volName, true); err == nil {
log.Printf("Provisioner: removed config volume %s", volName)
removed = true
}
}
log.Printf("Provisioner: removed config volume %s", volName)
csName := ClaudeSessionVolumeName(workspaceID)
if rmErr := p.cli.VolumeRemove(ctx, csName, true); rmErr != nil {
log.Printf("Provisioner: claude-sessions volume cleanup warning for %s: %v", csName, rmErr)
} else {
log.Printf("Provisioner: removed claude-sessions volume %s", csName)
if !removed {
return fmt.Errorf("failed to remove config volume for %s", workspaceID)
}
for _, csName := range []string{ClaudeSessionVolumeName(workspaceID), legacyClaudeSessionVolumeName(workspaceID)} {
if rmErr := p.cli.VolumeRemove(ctx, csName, true); rmErr == nil {
log.Printf("Provisioner: removed claude-sessions volume %s", csName)
}
}
return nil
}
@@ -1354,37 +1407,34 @@ func (p *Provisioner) Stop(ctx context.Context, workspaceID string) error {
if p == nil || p.cli == nil {
return ErrNoBackend
}
name := ContainerName(workspaceID)
// Force-remove kills and removes in one atomic operation, bypassing
// the restart policy entirely.
err := p.cli.ContainerRemove(ctx, name, container.RemoveOptions{Force: true})
if err == nil {
log.Printf("Provisioner: stopped and removed container %s", name)
return nil
// KI-013 deploy safety: try new full-ID name first, then fall back to
// the old truncated name so pre-deploy containers are still stoppable.
names := []string{ContainerName(workspaceID), legacyContainerName(workspaceID)}
for _, name := range names {
// Force-remove kills and removes in one atomic operation, bypassing
// the restart policy entirely.
err := p.cli.ContainerRemove(ctx, name, container.RemoveOptions{Force: true})
if err == nil {
log.Printf("Provisioner: stopped and removed container %s", name)
return nil
}
if isContainerNotFound(err) {
// Try the next name (legacy fallback). If both miss, the
// container is genuinely gone — post-condition satisfied.
continue
}
if isRemovalInProgress(err) {
// Another concurrent caller is already removing this container.
log.Printf("Provisioner: container %s removal already in progress (no-op)", name)
return nil
}
// Real failure: daemon timeout, socket EOF, ctx cancellation, etc.
log.Printf("Provisioner: force-remove failed for %s: %v", name, err)
return fmt.Errorf("force-remove %s: %w", name, err)
}
if isContainerNotFound(err) {
// Container was already gone — the post-condition we want is
// satisfied. Don't surface as an error.
log.Printf("Provisioner: container %s already gone (no-op)", name)
return nil
}
if isRemovalInProgress(err) {
// Another concurrent caller (orphan sweeper, sibling cascade
// delete, manual `docker rm -f`) is already removing this
// container. The post-condition is the same as success: the
// container WILL be gone shortly. Surfacing this as a 500 on
// cascade-delete causes UI confusion ("workspace marked
// removed, but stop call(s) failed — please retry") even
// though retrying would just race the same in-flight removal.
log.Printf("Provisioner: container %s removal already in progress (no-op)", name)
return nil
}
// Real failure: daemon timeout, socket EOF, ctx cancellation, etc.
// Caller (workspace_crud.stopAndRemove, orphan_sweeper.sweepOnce)
// must propagate this so they can skip the follow-up RemoveVolume.
log.Printf("Provisioner: force-remove failed for %s: %v", name, err)
return fmt.Errorf("force-remove %s: %w", name, err)
// Both names missed — container was already gone.
log.Printf("Provisioner: container %s already gone (no-op)", ContainerName(workspaceID))
return nil
}
// IsRunning checks if a workspace container is currently running.
@@ -1444,16 +1494,20 @@ func RunningContainerName(ctx context.Context, cli *client.Client, workspaceID s
if cli == nil {
return "", ErrNoBackend
}
name := ContainerName(workspaceID)
info, err := cli.ContainerInspect(ctx, name)
if err != nil {
if isContainerNotFound(err) {
return "", nil
// KI-013 deploy safety: new full-ID name first, then fall back to the
// old truncated name so pre-deploy containers are still discoverable.
names := []string{ContainerName(workspaceID), legacyContainerName(workspaceID)}
for _, name := range names {
info, err := cli.ContainerInspect(ctx, name)
if err != nil {
if isContainerNotFound(err) {
continue
}
return "", err
}
if info.State.Running {
return name, nil
}
return "", err
}
if info.State.Running {
return name, nil
}
return "", nil
}
@@ -425,7 +425,7 @@ func TestContainerName(t *testing.T) {
}{
{"short", "ws-short"},
{"exactly12ch", "ws-exactly12ch"},
{"longer-than-twelve-characters", "ws-longer-than-"},
{"longer-than-twelve-characters", "ws-longer-than-twelve-characters"},
{"abc", "ws-abc"},
}
@@ -437,6 +437,17 @@ func TestContainerName(t *testing.T) {
}
}
// TestContainerName_DistinctSamePrefix12 is a regression guard for KI-013:
// two UUIDs sharing the same first 12 characters must produce distinct
// container names (the old 12-char truncation caused collisions).
func TestContainerName_DistinctSamePrefix12(t *testing.T) {
id1 := "123456789abc-4def-1234-567890abcdef"
id2 := "123456789abc-4def-1234-567890abcdf0"
if ContainerName(id1) == ContainerName(id2) {
t.Fatalf("ContainerName must differ for same-first-12 UUIDs: both = %q", ContainerName(id1))
}
}
// TestConfigVolumeName verifies config volume naming.
func TestConfigVolumeName(t *testing.T) {
tests := []struct {
@@ -445,7 +456,7 @@ func TestConfigVolumeName(t *testing.T) {
}{
{"short", "ws-short-configs"},
{"exactly12ch", "ws-exactly12ch-configs"},
{"longer-than-twelve-characters", "ws-longer-than--configs"},
{"longer-than-twelve-characters", "ws-longer-than-twelve-characters-configs"},
{"abc", "ws-abc-configs"},
}
@@ -457,10 +468,19 @@ func TestConfigVolumeName(t *testing.T) {
}
}
// TestConfigVolumeName_DistinctSamePrefix12 is a regression guard for KI-013.
func TestConfigVolumeName_DistinctSamePrefix12(t *testing.T) {
id1 := "123456789abc-4def-1234-567890abcdef"
id2 := "123456789abc-4def-1234-567890abcdf0"
if ConfigVolumeName(id1) == ConfigVolumeName(id2) {
t.Fatalf("ConfigVolumeName must differ for same-first-12 UUIDs: both = %q", ConfigVolumeName(id1))
}
}
// ---------- #12 — claude-sessions volume naming ----------
// TestClaudeSessionVolumeName_Deterministic: same ID → same volume name, and
// the name follows the ws-<id[:12]>-claude-sessions shape used everywhere
// the name follows the ws-<id>-claude-sessions shape used everywhere
// else in the provisioner.
func TestClaudeSessionVolumeName_Deterministic(t *testing.T) {
tests := []struct {
@@ -469,7 +489,7 @@ func TestClaudeSessionVolumeName_Deterministic(t *testing.T) {
}{
{"short", "ws-short-claude-sessions"},
{"exactly12ch", "ws-exactly12ch-claude-sessions"},
{"longer-than-twelve-characters", "ws-longer-than--claude-sessions"},
{"longer-than-twelve-characters", "ws-longer-than-twelve-characters-claude-sessions"},
{"abc", "ws-abc-claude-sessions"},
}
for _, tt := range tests {
@@ -484,6 +504,15 @@ func TestClaudeSessionVolumeName_Deterministic(t *testing.T) {
}
}
// TestClaudeSessionVolumeName_DistinctSamePrefix12 is a regression guard for KI-013.
func TestClaudeSessionVolumeName_DistinctSamePrefix12(t *testing.T) {
id1 := "123456789abc-4def-1234-567890abcdef"
id2 := "123456789abc-4def-1234-567890abcdf0"
if ClaudeSessionVolumeName(id1) == ClaudeSessionVolumeName(id2) {
t.Fatalf("ClaudeSessionVolumeName must differ for same-first-12 UUIDs: both = %q", ClaudeSessionVolumeName(id1))
}
}
// TestClaudeSessionVolumeName_DistinctFromConfig ensures we never alias the
// claude-sessions volume onto the config volume (deleting one must not wipe
// the other in RemoveVolume's cleanup path).
@@ -234,6 +234,13 @@ func Setup(hub *ws.Hub, broadcaster *events.Broadcaster, prov *provisioner.Provi
// this specific workspace, or a control-plane-verified tenant session.
wsAuth.PATCH("", wh.Update)
// Compute options — SSOT for the canvas Container-Config tab's cloud-
// provider + instance-type dropdowns (core#2489). Returns the same
// provider/instance metadata validateWorkspaceCompute enforces, so the UI
// can never offer a (provider, instance-type) the PATCH then rejects with
// a 400. Static (derived from the in-binary allowlist) — no DB round-trip.
wsAuth.GET("/compute-options", wh.ComputeOptions)
// Lifecycle
wsAuth.GET("/state", wh.State)
wsAuth.POST("/restart", wh.Restart)
@@ -1163,3 +1163,109 @@ func TestSanitizeUTF8(t *testing.T) {
t.Errorf("sanitizeUTF8 did not produce valid UTF-8: %x", []byte(out))
}
}
// ── TestClassifyTaskState ───────────────────────────────────────────────────
func TestClassifyTaskState_NoStatus(t *testing.T) {
result := map[string]json.RawMessage{"other": json.RawMessage(`"x"`)}
if got := classifyTaskState(result); got != "" {
t.Errorf("classifyTaskState(no status) = %q, want empty", got)
}
}
func TestClassifyTaskState_OKStates(t *testing.T) {
for _, state := range []string{"", "submitted", "working", "completed"} {
result := map[string]json.RawMessage{
"status": json.RawMessage(`{"state":"` + state + `"}`),
}
if got := classifyTaskState(result); got != "" {
t.Errorf("classifyTaskState(%q) = %q, want empty (OK state)", state, got)
}
}
}
func TestClassifyTaskState_FailureState(t *testing.T) {
result := map[string]json.RawMessage{
"status": json.RawMessage(`{"state":"failed"}`),
}
if got := classifyTaskState(result); got != "failed" {
t.Errorf("classifyTaskState(failed) = %q, want failed", got)
}
}
func TestClassifyTaskState_MalformedStatus(t *testing.T) {
result := map[string]json.RawMessage{
"status": json.RawMessage(`{broken`),
}
if got := classifyTaskState(result); got != "" {
t.Errorf("classifyTaskState(malformed) = %q, want empty", got)
}
}
// ── TestIsEmptyResponse ─────────────────────────────────────────────────────
func TestIsEmptyResponse_EmptyBody(t *testing.T) {
if !isEmptyResponse([]byte{}) {
t.Error("isEmptyResponse(empty) should be true")
}
}
func TestIsEmptyResponse_NoResponseGenerated(t *testing.T) {
if !isEmptyResponse([]byte(`(no response generated)`)) {
t.Error("isEmptyResponse(no-response-generated) should be true")
}
}
func TestIsEmptyResponse_TextFieldEmpty(t *testing.T) {
if !isEmptyResponse([]byte(`{"result":{"parts":[{"text":""}]}}`)) {
t.Error("isEmptyResponse(empty text field) should be true")
}
}
func TestIsEmptyResponse_TextFieldNoResponse(t *testing.T) {
if !isEmptyResponse([]byte(`{"result":{"parts":[{"text":"(no response generated)"}]}}`)) {
t.Error("isEmptyResponse(text=no-response-generated) should be true")
}
}
func TestIsEmptyResponse_HasContent(t *testing.T) {
if isEmptyResponse([]byte(`{"result":{"parts":[{"text":"hello"}]}}`)) {
t.Error("isEmptyResponse(with content) should be false")
}
}
// ── TestA2AErrorFromBody ────────────────────────────────────────────────────
func TestA2AErrorFromBody_Empty(t *testing.T) {
if got := a2aErrorFromBody([]byte{}); got != "" {
t.Errorf("a2aErrorFromBody(empty) = %q, want empty", got)
}
}
func TestA2AErrorFromBody_JSONRPCMessage(t *testing.T) {
body := []byte(`{"error":{"code":-32603,"message":"internal error"}}`)
if got := a2aErrorFromBody(body); got != "internal error" {
t.Errorf("a2aErrorFromBody(JSON-RPC) = %q, want internal error", got)
}
}
func TestA2AErrorFromBody_PlainString(t *testing.T) {
body := []byte(`{"error":"something went wrong"}`)
if got := a2aErrorFromBody(body); got != "something went wrong" {
t.Errorf("a2aErrorFromBody(plain) = %q, want something went wrong", got)
}
}
func TestA2AErrorFromBody_NoError(t *testing.T) {
body := []byte(`{"result":"ok"}`)
if got := a2aErrorFromBody(body); got != "" {
t.Errorf("a2aErrorFromBody(no error) = %q, want empty", got)
}
}
func TestA2AErrorFromBody_InvalidJSON(t *testing.T) {
body := []byte(`{broken`)
if got := a2aErrorFromBody(body); got != "" {
t.Errorf("a2aErrorFromBody(invalid) = %q, want empty", got)
}
}