The Create handler already validates (runtime, model) against the
provider registry (commit e53a47b4). The SetModel endpoint
(PUT /workspaces/:id/model) was the remaining unguarded save path —
a user could change the model after creation and bypass both the
model-registration gate and the derived-provider gate.
Fix:
- Query the workspace's runtime before persisting the model.
- Call validateRegisteredModelForRuntime + validateDerivedProviderInRegistry
for non-empty models, mirroring the Create handler order and error
shape (422 with code + actionable list).
- Return 404 when the workspace does not exist.
- Federation contract preserved: unknown runtimes fail-open exactly
as in Create.
Tests:
- Update existing SetModel / RoundTrip mocks to expect the runtime
lookup query.
- Add TestSecretsSetModel_UnregisteredModel_422.
- Add TestSecretsSetModel_UnknownRuntimeFailOpen_200.
- Add TestSecretsSetModel_WorkspaceNotFound_404.
Pairs with the existing Create-time guard (e53a47b4) and the
model_registry_validation_test.go regression suite.
SOP: /sop-ack engineer-ack as fullstack-engineer
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Reminder (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Reminder (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Python Lint & Test (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
The `E2E API Smoke Test` REQUIRED gate (and the sibling local-platform E2E
workflows) started the platform in the background and waited for /health with
a fixed 30×1s loop (~30s). The platform binds /health only AFTER applying the
FULL migration chain on cold start; that chain now reaches past the 30s window
(the run log gets to 20260523000000_schedule_consecutive_sdk_errors.up.sql
before "Platform starting on :PORT"), so the health loop expired before the
server was reachable → downstream E2E never ran → main went red. A fixed budget
is brittle by construction because the migration chain grows every release.
Fix (deterministic, not a bigger magic number):
- Poll /health on a generous, clearly-commented wall-clock budget (180s) that
comfortably exceeds cold-start + full-migration time and is robust to the
chain continuing to grow. /health returning 200 is the real readiness signal
(migrations done + server listening).
- Still fail fast + loud on a genuinely dead platform: if the backgrounded
platform-server PID has exited (e.g. a broken migration crashed it), stop
immediately and dump the platform log — we never mask a real startup failure,
and we never wait out the full budget for a process that is already gone.
- On true timeout, dump the platform log tail and fail with ::error::.
Applied identically to the four workflows sharing the 30×1s platform-/health
pattern: e2e-api, e2e-chat, e2e-peer-visibility, e2e-legacy-advisory. The
unrelated Postgres-readiness `seq 1 30` waits (which are not gated on the
migration chain) are intentionally left unchanged.
curl usage avoids the -w '%{http_code}' status-capture shape, so
lint-curl-status-capture passes; lint-workflow-yaml passes on all 56 files.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Reminder (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
Step 8 of the full-lifecycle SaaS canary sends an A2A round-trip to the
parent and asserts a PONG. When the configured completion backend returns
a 2xx with no text part (empty content / tool_calls-or-reasoning-only),
the agent runtime surfaces the literal reply "Error: message contained no
text content." Today that fell through the generic "error|exception"
catch-all and was reported as a vague "A2A returned an error-shaped
response", which misdirects triage to workspace-server.
Add a specific error-class check (mirroring the existing hermes-401 /
quota-exhausted patterns) that names this as a model/provider BACKEND
regression with the operator action, before the generic catch-all. No
behaviour change for healthy runs; the failure still hard-fails — it is
just diagnosed correctly.
Observed 2026-06-03/04: 100% of staging canaries on MODEL_SLUG=MiniMax-M2
(canary default since #2710) hit this on the parent's first cold turn,
identical on main's scheduled synthetic E2E and on open PRs — i.e. an
environmental backend regression, not PR-introduced. This is purely a
diagnostic-precision improvement to the unmodified main-line step-8 block.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E2E Staging Canvas (Playwright) / "Canvas tabs E2E" went red on main HEAD
b9d2f023. The actual failure (runner-6 task 258160) is in the Playwright
globalSetup, NOT in any spec assertion:
[staging-setup] Workspace created: 8e5c7354-...
Error: Workspace failed: (no last_sample_error) full body:
{... "runtime":"hermes","status":"failed","uptime_seconds":0,
"last_sample_error":null ...}
at canvas/e2e/staging-setup.ts:272 (waitFor "workspace online")
Root cause — NOT a canvas/test regression and NOT timing fragility. It is
a deterministic consequence of workspace-server #2162 (merged 2026-06-03,
"platform-managed workspace must fail-closed when CP proxy env absent"),
which is a correct production safety fix. The canvas E2E creates a bare
hermes/gpt-4o workspace that defaults closed to platform_managed; on a
staging tenant without MOLECULE_LLM_BASE_URL / MOLECULE_LLM_USAGE_TOKEN,
the agent now aborts at boot with MISSING_PLATFORM_PROXY — surfacing as
the pre-start credential-abort shape (status:"failed", uptime_seconds:0,
no last_sample_error). Pre-#2162 the same workspace booted credential-less
(the bug #2162 fixed) so the old harness happened to pass.
The fix is in the harness, because this test does not need a booted agent:
staging-tabs.spec.ts only opens the 13 side-panel tabs and asserts no hard
crash / no "Failed to load" toast. It makes zero LLM calls and even mocks
/cp/auth/me + 401→200. All it needs is a workspace ROW so the node + tabs
render.
So step 6 now waits for RENDERABLE instead of strictly online:
- online -> happy path (staging with proxy env)
- failed + uptime_seconds==0 + no sample -> pre-start credential-abort:
agent never ran, row still renders -> proceed, with a loud console.warn
- any other failed (last_sample_error present, OR uptime_seconds>0 i.e.
the agent started then crashed) -> still hard-throws (no masking)
Real infra/provision failure stays loud one step earlier at the org level
(instance_status === "failed", unchanged).
Verification: tsc clean for canvas/e2e/staging-* (pre-existing tsc errors
are all in unrelated __tests__ files); `playwright test --list` resolves
globalSetup + the single spec. Full live run needs staging CP creds not
available locally; the changed branch is the globalSetup readiness gate,
verified by inspection against the captured failing-run body.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The canvas-readiness loop added in PR #2195 captured the curl status
into CODE with `CODE=$(curl -s -o /dev/null -w '%{http_code}' ...
|| echo 000)`. That shape is exactly the BAD_STATUS_CAPTURE pattern
that .gitea/scripts/lint-curl-status-capture.py rejects — curl -w can
write a status to stdout before the || echo 000 fallback fires,
producing polluted values such as a concatenated status string rather
than one code.
Adopt the lint-approved tempfile pattern already used by
e2e-staging-external.yml (set +e / curl -w '...' > file / set -e /
cat file || echo '000') so the captured value is always a clean HTTP
code or '000'.
Closes#2198 (main-red after #2195).
Closes#2199 (auto-filed main-red watchdog, root cause identical to #2198).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Reminder (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Python Lint & Test (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
lint-continue-on-error-tracking / lint-continue-on-error-tracking (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
The moonshot/kimi incident: a canvas-created claude-code workspace with
provider=Platform + model=moonshot/kimi-k2.6 booted NOT_CONFIGURED in prod
because the generated config.yaml lacked the manifest-derived `provider:`
key, so the adapter slash-split "moonshot/..." -> unregistered provider.
Fixed by #2187 (ensureDefaultConfig stamps DeriveProvider->provider:platform)
+ #2188 (canvas). Unit tests passed; the REAL boot path was the gap.
This adds comprehensive regression coverage so the CLASS cannot reship:
Deterministic (no live infra, runs in the normal unit suite):
workspace-server/internal/handlers/workspace_provision_platform_boot_test.go
- TestEnsureDefaultConfig_StampsProviderForEverySSOTPlatformModel:
enumerates the claude-code `platform` arm from the providers SSOT
(providers.LoadManifest) and asserts ensureDefaultConfig stamps
provider:platform (top-level AND runtime_config) for EVERY offered
platform model — not just the single moonshot/kimi pin #2187 shipped.
A newly-offered platform model gets a case for free and only passes if
actually stamped (closes the offered-but-not-stamped divergence the bug
rode in on). Mutation-verified: disabling the stamp fails the test.
- TestPlatformModelDeriveProvider_SSOTConsistency: the upstream half —
DeriveProvider maps every SSOT platform model to provider Name "platform".
Real-boot (staging; I will run it):
Extends the existing staging harness (no new harness) with a
platform-managed path: E2E_LLM_PATH=platform pin-selects moonshot/kimi-k2.6,
sends NO tenant key, and reuses the harness's online-wait + completion
assertions to prove the workspace reaches status=online (not
not_configured) and a completion returns 200. The BYOK branches never
exercised the platform arm — the exact arm the bug shipped on.
- tests/e2e/lib/model_slug.sh: platform path + override semantics
- tests/e2e/test_model_slug.sh: 4 new pinned cases (16/16 green)
- tests/e2e/test_staging_full_saas.sh: empty-secrets platform branch
- .gitea/workflows/e2e-staging-saas.yml: new `E2E Staging Platform Boot`
job (continue-on-error during de-flake; bp-required: pending #2187),
+ providers.yaml/model_slug.sh added to the path triggers.
Coverage-audit theme: mc#1982 (continue-on-error masks; de-flake-then-gate).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause of the intermittent `E2E Chat / E2E Chat` red
(`chat-mobile.spec.ts › history persists across reload`) is a REAL
product persistence race, not test fragility.
The push-mode A2A success path (`logA2ASuccess`) wrote the `a2a_receive`
activity_logs row — the ONLY durable record of a chat round-trip
(request_body = user message, response_body = agent reply, both read
back by chat-history hydration) — in a DETACHED goroutine via `goAsync`.
`ProxyA2A` flushes the HTTP 200 (carrying the reply) the moment
`proxyA2ARequest` returns, i.e. BEFORE that goroutine's INSERT commits.
The test's `page.reload()` then fires `GET /chat-history`, which reads
activity_logs and can miss the not-yet-committed row → "Mobile
persistence" absent → red. Outside the test the same window loses the
message on a reload / workspace-server restart / deploy / OOM between the
200 and the goroutine commit.
The poll-mode sibling path (`logA2AReceiveQueued` /
`persistUserMessageAtIngest`) was already made synchronous for exactly
this incident class (internal#470 / #1347 / RFC#2945). The push-mode
counterpart was left async — fixed here by writing the row inline
(context.WithoutCancel so a chat-exit disconnect can't abort it; still
best-effort so a DB hiccup never fails the user's send). The 200 is now
emitted only after the durable row exists.
Secondary determinism hardening:
- chat-mobile spec: after reload, deterministically wait for the
`GET /chat-history` 2xx that rehydrates the transcript before asserting
visibility, instead of racing a fixed 5s render timeout against an
in-flight fetch.
- e2e-chat.yml canvas readiness: probe the real `/?m=chat` route for a
2xx (Turbopack compiles routes lazily — a bare `curl /` 200s before the
page the tests load has compiled) and raise the cold-start budget
30s→120s to kill the `Canvas did not start in 30s` flake.
Verification: `go build`, `go vet`, full `internal/handlers` +
`internal/messagestore` test suites green (sqlmock, no DB needed);
Playwright spec compiles + lists; eslint clean. Browser E2E not run
locally (needs Postgres+Redis+platform+canvas servers).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
publish-workspace-server-image / Production auto-deploy intermittently
fails on main with:
::error::<slug> is stale: actual=<newerSHA>, expected=<thisSHA>
Root cause: the workflow deliberately has no `concurrency:` (Gitea
1.22.6 cancels queued runs even with cancel-in-progress:false, which is
unacceptable for a prod deploy). So when two main pushes land close
together (eb31bcf then 286338), BOTH deploy-production jobs run. The
newer job (286338 -> staging-2863380) rolls the fleet forward first;
then the OLDER job (eb31bcf) runs "Verify reachable tenants report this
SHA", sees tenants on 2863380, and fails on STRICT SHA EQUALITY — even
though the fleet is AHEAD, not behind. Git SHAs aren't ordered and
/buildinfo exposes only git_sha (no build time / monotonic number), so
the verify can't tell "ahead" from "behind" on its own.
Fix (option b — superseded-job detection): before the strict verify,
ask Gitea for the current head of the deploy branch (main). If it is no
longer this job's GITHUB_SHA, a newer commit has landed and this deploy
is superseded; the newest job's verify is authoritative. Log a notice
and exit success, skipping strict equality for the stale job.
Why this preserves real-stale detection:
- Only the SUPERSEDED (older) job skips strict verify. The LATEST deploy
job (head == its SHA) still runs strict equality, so a genuinely
behind/older tenant still fails loudly.
- Fail-safe: if the branch head can't be read (no token / API error) or
equals our SHA, superseded_by returns None -> strict verify runs. An
unreadable head never silently greens a deploy.
Why not the alternatives:
- (a) build-timestamp/monotonic compare: /buildinfo returns only
{git_sha} (router.go, buildinfo.go). Adding a build-time field needs a
workspace-server binary + Dockerfile change and a full fleet rebuild
before it can be relied on — heavy and slow to take effect.
- (c) concurrency: forbidden by the workflow header (Gitea cancels
queued prod deploys).
Verification:
- New unit tests for superseded_by / current_branch_head and the
fail-safe path; full suite 33 passed.
- Workflow yaml-lint clean (lint-workflow-yaml.py).
- CLI smoke test: eb31bcf-vs-2863380 -> exit 0 (skip, success);
latest job -> exit 10 (run strict verify); unreadable head -> exit 10.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Shellcheck (E2E scripts) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
CI / Canvas Deploy Reminder (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
The two org-template repos were intentionally deleted; manifest.json still
referenced them so clone-manifest.sh 404'd → build-and-push failed → main red.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause: the Create Workspace dialog built its provider→model dropdown
catalog with the LEGACY buildProviderCatalog(llmModels), whose inferVendor
heuristic slash-splits a platform model id like `moonshot/kimi-k2.6` into
vendor `moonshot`. There was therefore no `Platform` bucket and the create
payload sent `llm_provider: "moonshot"` for a platform-managed model.
ConfigTab was migrated to the registry-backed catalog
(buildProviderCatalogFromRegistry) in internal#718 P3; CreateWorkspaceDialog
was not. This mirrors that migration:
- Thread the registry fields (registry_backed / registry_providers /
registry_models) — already returned by GET /templates — through TemplateSpec.
- When the selected runtime's /templates row is registry_backed, build the
catalog from registry_providers/registry_models (each model carries its
DERIVED provider, e.g. moonshot/kimi-k2.6 → "platform"), feed the selector
the registry models, and pass the prebuilt catalog verbatim. Restores the
`Platform` bucket and makes the payload send `llm_provider: platform`.
- Non-registry runtimes / older backends keep the legacy buildProviderCatalog
fallback unchanged.
Tests: added a registry-backed claude-code fixture whose plain models[] is
UN-annotated (so the legacy path would mis-bucket to "moonshot"), asserting the
Platform bucket appears and selecting moonshot/kimi-k2.6 yields
llm_provider: platform; plus a MiniMax derived-provider/BYOK case. Verified the
3 new tests FAIL on the pre-fix code and PASS after. Full canvas suite: 3334
passed / 3 skipped. tsc: 0 new errors (223→223, all pre-existing test Mock
drift). eslint clean on touched files.
Fix C of the RFC#340 convergence (cosmetic/UX, client-only, no serving-path
risk). Fix A (workspace-server) is the boot fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A canvas-created claude-code workspace with model moonshot/kimi-k2.6 booted
NOT_CONFIGURED: the adapter slash-split the model id to provider="moonshot",
which is not in the providers registry. CP bakes `provider: platform` via
heredoc, but the cp#329 config-bundle fetch overwrites /configs/config.yaml
with the (previously providerless) bundle version, so molecule-runtime
config.py re-derived the wrong provider and the adapter raised ValueError.
Fix A: in ensureDefaultConfig, derive the provider via the SAME providers
manifest path the config-SAVE validators use (providerRegistry() +
Manifest.DeriveProvider, nil auth env) and stamp it into config.yaml at both
the top level and under runtime_config, mirroring CP's buildModelProviderYAML
shape. The derive uses the FULL un-normalized model id so the exact-id match
resolves moonshot/kimi-k2.6 -> platform before claude-code normalization
strips the slash prefix.
Fail-open: a derive miss (unregistered model, unknown runtime, registry
unavailable) omits the provider field entirely — preserving today's behavior;
provisioning never fails on a miss. The existing template providers: registry
block injection is unchanged.
Tests: assert provider=platform (top-level + runtime_config) for claude-code +
moonshot/kimi-k2.6, and assert no provider: key for an unregistered model.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
Stale :latest reverted a production tenant (molecule-adk-demo,
2026-06-03). This workflow builds + pushes molecule-ai/platform-tenant
as :staging-<sha> + :staging-latest on every main build, but never
re-points :latest. So :latest stayed pinned to the 2026-05-10 build
(3.5 weeks stale). A no-arg POST /cp/admin/tenants/:slug/redeploy whose
default tag fell through to "latest" then pulled that stale image and
reverted the tenant.
Add a "Promote :latest" step to the deploy-production job that re-points
:latest (prod + staging ECR) to the just-shipped staging-<sha> image.
DESIGN — promote point, NOT raw build: the step lives at the END of
deploy-production, after wait-ci (green main CI) + the canary-first
batched fleet rollout + /buildinfo SHA verification. So :latest only
advances to a SHA that is actually green and confirmed running across
the live fleet — :latest == "current prod image", never a raw build
that might later fail the gate. If PROD_AUTO_DEPLOY is disabled, :latest
is correctly NOT advanced (an unpromoted build must not become :latest).
:staging-latest remains the rolling raw-build pointer for staging/E2E.
Re-tag is digest-level (docker buildx imagetools create) — no rebuild;
:latest is byte-identical to :staging-<sha> for that commit.
Pairs with molecule-controlplane change that flips the no-arg redeploy
default from :latest to :staging-latest (defense-in-depth).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Catches the adk-demo Assistant boot failure class (2026-06-03):
workspace config model=moonshot/kimi-k2.6 (claude-code)
→ adapter derives provider=moonshot
→ ValueError: provider=moonshot not in providers registry
→ save was accepted, agent wedged at boot, CI never saw it
The drift gate (RFC#580) validates templates; the existing model-side
validator (validateRegisteredModelForRuntime, P4 PR-2) catches a
(runtime, model) the runtime doesn't own. Neither checked the
DERIVED provider's membership in providers.yaml — the gate the
adapter actually trips at boot.
Fix (issue #2172, fail-closed at config-SAVE):
* validateDerivedProviderInRegistry (this PR) — load the manifest,
call DeriveProvider(runtime, model, nil) to get the provider the
adapter will resolve, and assert the provider name is in the
providers list. Returns 422 DERIVED_PROVIDER_NOT_IN_REGISTRY with
the sorted list of valid providers (actionable, unlike the
boot-time ValueError). Federation contract mirrored from the
model-side check (langgraph/external/kimi/mock pass through).
* Wired into CreateWorkspace after the existing model-side check.
Both gates fail-closed for first-party runtimes and fail-open for
non-registry / federated runtimes — the same shape.
* TestRegistryConsistency_AllNativeModelsDeriveToKnownProvider —
the static regression gate the issue asks for ('a CI test fails
if any shipped demo/template config references an unregistered
provider'), generalized to the catalog: walk every (runtime,
model) in the native model sets and assert each one derives to
a provider in the providers list. By construction always true
today, but fires on any future drift between providers: and
runtimes: in providers.yaml (the exact class cp#455 / boot-e2e
targets at the runtime layer).
* TestValidateDerivedProviderInRegistry — table-driven pass/fail
coverage mirroring TestValidateRegisteredModelForRuntime, plus
the langgraph / external / empty-model fail-open cases.
Pairs with cp#455 boot-to-registration e2e (the deep runtime layer);
this is the fast static layer the issue asked for. Reverts cleanly
by deleting the new validator + the wire-up in workspace.go.
SOP: /sop-ack engineer-ack as fullstack-engineer
Tested: build drift pre-checked; test cases pin both happy path
and the federation contract.
The ${INTEGRATION_DB_URL%%@*} pattern strips only the host portion,
leaving the user:password prefix exposed in CI logs. Replace with a
static confirmation string.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
**Step A — Go-level fail-closed**
Extract a shared `requireIntegrationDBURL(t)` helper into
`integration_helper_test.go` (build-tag: integration). The helper:
- Returns $INTEGRATION_DB_URL when present
- Calls `t.Fatalf` when the URL is empty AND any CI marker is set
(`CI`, `GITHUB_ACTIONS`, or `GITEA_ACTIONS`), preventing a silent
skip-to-green in CI
- Calls `t.Skip` when the URL is empty AND no CI marker is set,
preserving the local-dev ergonomics
Update all three integration test files to use the shared helper:
- delegation_ledger_integration_test.go
- pending_uploads_integration_test.go
- workspace_create_name_integration_test.go
This closes the Go-level fail-open where a missing INTEGRATION_DB_URL
in CI would cause every integration test to skip and report PASS.
**Step C — Workflow bash preflight**
Add a `Preflight — INTEGRATION_DB_URL must be present` step in
`.gitea/workflows/handlers-postgres-integration.yml` immediately before
the `go test` invocation. If the postgres-start step failed to export
the variable, the preflight exits 1 with `::error::` so the job fails
loud before the test binary can even start.
**Step B — Workflow CoE mask**
ALREADY FIXED in current main: both `detect-changes` and `integration`
jobs have `continue-on-error: false` (lines 93 and 125). The context is
already listed in `audit-force-merge.yml` REQUIRED_CHECKS_JSON for
`main`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the single-field _get_status_updated_at with a richer
_get_status_snapshot that captures status id, updated_at, and target_url.
Add _extract_run_id helper to parse the Actions run_id from the
status target_url (Gitea 1.22.6 lacks REST /actions/runs/* endpoints,
so the run_id embedded in target_url is the strongest available proxy
for distinct run_id).
_poll_fresh_statuses now considers a status fresh if ANY of the
following changed from the pre-review snapshot: updated_at, id, or
target_url. This catches both timestamp-only updates and new-run
indicators.
In the test body, collect pre-existing run_ids before submitting the
APPROVED review. After polling, assert that each required context's
fresh status either has no target_url/run_id (cannot verify) or points
to a run_id that did NOT exist before the review. This proves the
status was posted by a NEW workflow run triggered from the
pull_request_review event, not merely updated in-place by an earlier
run.
Findings 2 & 3 (APPROVED spelling, HTTPError body double-read) were
already fixed in commit 77573074.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Use RFC3339Nano + 200ms gaps in BeforeTS test to avoid second-
truncation and Go/Postgres clock skew.
- Pre-set attempts=5 on seeded A2A queue item so MarkQueueItemFailed
transitions to 'failed' on first call (attempts are normally
incremented by DequeueNext, which the test bypasses).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Set peer role on seeded workspace so peer_role is populated in
?include=peer_info response (handler omits empty peer fields).
- Use valid UUID instead of empty string for caller_id in
seedA2AQueueItem to satisfy UUID column constraint.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TestIntegration_ActivityList_Basic panicked with a nil pointer
dereference at activity.go:512 because gin.CreateTestContext returns
a context with c.Request == nil, and List() calls c.Request.Context().
Add a dummy httptest.NewRequest to newTestGinContext() so every test
that uses the helper has a non-nil request.
Relates to #2151.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces t.Skip with t.Fatal in the integration helper so that a
missing INTEGRATION_DB_URL env var surfaces as a hard failure rather
than a silent skip. The skip pattern is a fail-open dark-wedge: CI
could misconfigure the env, every test skips, and the gate reports
GREEN while exercising zero code.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Changes NewActivityHandler and NewDelegationHandler to accept the
narrow events.EventEmitter interface instead of *events.Broadcaster.
This aligns with WorkspaceHandler (already interface-typed) and lets
integration tests substitute noOpEmitter{} without standing up Redis.
No production callers affected — *events.Broadcaster still satisfies
the interface via the existing compile-time assertion.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Scaffold file with integrationDB helper, seed fixtures, and 4 starter
real-Postgres tests:
- TestIntegration_ActivityList_Basic
- TestIntegration_DelegationList_Basic
- TestIntegration_A2AQueue_EnqueueAndDepth
- TestIntegration_A2AQueue_DequeueNext
TODO markers for the full CRUD matrix awaiting spec delivery.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CR2 blocking finding: the test registered waitForHandlerAsyncBeforeDBCleanup
BEFORE setupTestDB/setupTestRedis, which meant LIFO cleanup executed:
1. Redis close
2. db.DB restore
3. asyncWG wait
This caused the async goroutine (which accesses DB + Redis) to potentially
run against cleaned-up resources.
Fix: move waitForHandlerAsyncBeforeDBCleanup AFTER setupTestDB/setupTestRedis
so LIFO order becomes:
1. asyncWG wait (drain goroutines)
2. db.DB restore
3. Redis close
Matches the pattern already used in TestGracefulPreRestart_Success,
_NotImplemented, and _ConnectionRefused.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ci-arm64-advisory / fast-checks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
The arm64-pilot workflow was failing the 'Identify runner' step when a
runner with label 'arm64-darwin' was not actually arm64. Because the
step lacked continue-on-error, the job failed → posted failure status
→ triggered main-red watchdog.
Changes:
- Identify runner: add id + continue-on-error; emit GITHUB_OUTPUT flag
'arm64' so subsequent steps can conditional-skip gracefully.
- Checkout, Install, Run steps: gate on steps.identify.outputs.arm64.
- Install step: detect Darwin vs Linux and download the correct
shellcheck binary (darwin.aarch64 vs linux.aarch64). Previously
always downloaded the Linux binary, which won't run on macOS.
- Run step: verify shellcheck is actually executable (not just in
PATH) before attempting to lint.
Fixes#2146
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- test_gate_auto_fire_live.py: change review event from \"APPROVE\" to
\"APPROVED\" to match Gitea API contract.
- Add _get_status_updated_at() to capture pre-existing status timestamps
before review submission.
- Add _poll_fresh_statuses() that only accepts statuses whose updated_at
differs from the pre-existing record, proving the context was posted
AFTER the review rather than tolerating stale contexts.
- Remove misleading \"tolerate stale contexts\" comment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1. DOC - runbooks/dev-sop.md:
- Documents the Gitea PR-head workflow-selection rule (workflows load
from PR head, not base).
- Describes the standard core-PR flow: auto-fire for fresh heads,
slash-refire fallback for stale heads.
- Provides quick-check curl command and rebase vs. slash-refire guidance.
2. LIVE-FIRE TEST - test_gate_auto_fire_live.py:
- Runtime verification that submitting an APPROVED review to a PR whose
head contains the current gate workflows causes Gitea Actions to queue
qa-review + security-review and POST the BP-required contexts.
- Fix: handle string trigger form in addition to list/dict.
3. STALE-HEAD DIAGNOSTIC - test_gate_stale_head_diagnostic.py:
- Local-checkout baseline + optional PR_NUMBER mode.
- Fix: avoid double exc.read() on HTTPError (always returned empty).
- Fix: handle string trigger form.
CR round-2 fixes:
- Reverted out-of-scope Go changes that accidentally reverted the #2162
platform-managed fail-closed guard.
- Restored regression tests and env-mocking that were removed from Go tests.
The lint-pre-flip-continue-on-error gate was grepping ``::error::`` in
raw run logs without distinguishing actual execution output from script
source displayed inside ``::group::Run`` blocks. Bash workflows that
defensively contain ``echo \"::error::...\"`` branches (e.g. Postgres
port-resolution failure handlers) caused false-positive "masked run"
verdicts even when those branches were never executed.
Fix: track ``::group::Run`` / ``::endgroup::`` state while scanning the
log, skipping lines inside script-source display blocks. Also add a
heuristic guard for ``echo "::error::"`` on the same line.
This unblocks the two real-infra workflow flips in this PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
#2167 was accidentally merged to the staging branch instead of main; the
belt (cp#477) + workspace-provision fail-closed (#2164) are already on main,
but this tenant-server boot assertion (assertManagedTenantHasLLMEnv) was not.
Cherry-picked from ffd1bb7f. Conflict in a2a_proxy_helpers.go (an unused
canvasUserMessage struct removal incidental to #2167) resolved by keeping
main's version — the suspenders fix is self-contained in cp_config.go + main.go.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flips continue-on-error: true -> false on the two real-infra jobs:
- Handlers Postgres Integration
- E2E API Smoke Test
These contexts are already listed as required on branch protection,
but the mask made each job report success even when its steps failed,
so the required gate could never actually block a bad merge.
If CI surfaces broken underlying tests on this PR, root-fix them —
do NOT renew the mask.
Closes#2152
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously PatchAbilities applied broadcast_enabled and
talk_to_user_enabled with two separate UPDATE statements. If the first
succeeded and the second failed, the workspace was left in a partial/
ambiguous capability state.
When both fields are present in the PATCH body, apply them in a single
combined UPDATE so the mutation is all-or-nothing. Single-field updates
continue to use the original per-column statements.
Updates the existing BothFields test to expect one combined UPDATE, and
replaces the old BothFields_BroadcastFails test with
BothFields_UpdateError which validates the atomic path.
Fixes#2131
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds error logging for silently-discarded returns in:
- bundle/importer.go: broadcast on provision-failed
- channels/manager.go: broadcast on inbound/outbound messages
- channels/telegram.go: bot.Send callback ack and edit message
- handlers/approvals.go: broadcast on approval create/escalate/decide
Does not change control flow; purely observability.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mc#1264: 7 tests fail under parallel CI execution with sqlmock
"was not expected" errors. Root cause is untracked goroutines
from RestartByID (sendRestartContext) that access db.DB after the
sqlmock is closed and db.DB is restored to the previous mock.
Fix: wrap the sendRestartContext goroutine in runRestartCycle with
h.goAsync so it is tracked by asyncWG. Tests that call
waitForHandlerAsyncBeforeDBCleanup will now wait for this goroutine
before restoring db.DB, preventing cross-test pollution.
Also fix TestGracefulPreRestart_* tests to call
waitForHandlerAsyncBeforeDBCleanup BEFORE setupTestDB, ensuring
LIFO order is: async wait → db.DB restore. Previously, async
cleanup was registered after setupTestDB, running before db.DB
restoration and leaving goroutines to hit the next test's mock.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Defensive hygiene for TestIntegration_BroadcastOrgRoot_NonRootSenderResolvesToRoot:
if a prior run crashed or was killed before t.Cleanup fired, stale rows
with the same itest-bcastroot-* prefix may remain in the shared integration
DB and collide on workspaces_parent_name_uniq. Delete them before inserting.
No production logic changed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:33:40 +00:00
57 changed files with 3966 additions and 176 deletions
if [ "$code" = "200" ] || [ "$code" = "204" ]; then
echo "[teardown] deleted $slug (HTTP $code)"
else
echo "::warning::platform-boot teardown for $slug returned HTTP $code — sweep-stale-e2e-orgs will catch it within ~45 min. Body: $(head -c 300 /tmp/plat-cleanup.out 2>/dev/null)"
leaks+=("$slug")
fi
done
if [ ${#leaks[@]} -gt 0 ]; then
echo "::warning::platform-boot teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
if [ "$SUPERSEDED_EXIT" -eq 0 ] && [ -n "$NEWER_HEAD" ]; then
echo "::notice::Superseded deploy: main head is now ${NEWER_HEAD:0:7} (this job deployed ${GITHUB_SHA:0:7}). The fleet is at or ahead of this build; the newer deploy job's verify is authoritative. Skipping strict SHA verify."
> Applies to: all core-PR authors and reviewers on `molecule-core` and sibling
> repos using the `qa-review` + `security-review` branch-protection gates.
---
## 1. Gitea PR-head workflow-selection rule
**Rule:** For `pull_request_target` and `pull_request_review` events, Gitea
loads the workflow definition from the **PR's HEAD branch**, not from the
base (`main`) branch.
This is different from GitHub Actions, where `pull_request_target` always
loads workflows from the base branch. Gitea's behaviour means:
- A PR that was opened **before** the `pull_request_review` trigger was added
to `qa-review.yml` / `security-review.yml` will **NOT** auto-fire on review,
because its HEAD still contains the old workflow YAML (no trigger).
- A PR that was opened **after** the trigger was added (or that has been
rebased onto a commit containing the trigger) **WILL** auto-fire, because its
HEAD contains the new workflow YAML.
### Ops implication
| PR head contains `pull_request_review` trigger? | Behaviour on APPROVED review |
|---|---|
| **Yes** (cut from current main, or rebased) | Workflows auto-queue, evaluate, and POST the `(pull_request_target)` context automatically. No slash-command needed. |
| **No** (stale head, opened before #2157) | Nothing fires. Use `/qa-recheck` + `/security-recheck` slash-commands in a PR comment, OR rebase onto current main. |
---
## 2. Standard core-PR flow (post-#2157)
```
1. Author opens PR from a branch based on current main
→ qa-review + security-review workflows run on pull_request_target
→ status contexts post (initial eval, usually red until reviews land)
2. Reviewers submit real APPROVED reviews
→ If PR head has the trigger: workflows AUTO-FIRE on pull_request_review
→ Contexts flip green (or stay red if reviewer is not in team)
3. [Optional] If contexts did not flip (stale head, event lost, etc.):
→ Anyone can comment `/qa-recheck` or `/security-recheck`
→ sop-checklist.yml refires the evaluator (read-only, idempotent)
4. Both qa-review + security-review contexts are green
→ Plain Do:merge (no force-merge needed)
```
### Key point
The `/qa-recheck` and `/security-recheck` commands are a **backstop**, not the
primary path. PRs cut from current main should auto-fire without manual
intervention.
---
## 3. Diagnosing a stale head
If a PR has real team-member APPROVED reviews but the qa/security contexts
remain red and no workflow run appears on the PR's "Actions" tab for the
review event, the PR head is likely stale.
### Quick check
```bash
# From the PR page, look at the head commit SHA, then:
| **Rebase onto current main** | PR is genuinely stale (head lacks trigger OR head is far behind main) | Clean history, gets all recent fixes, but requires force-push and re-approval if the branch was protected |
| **`/qa-recheck` + `/security-recheck`** | PR head is recent but the review event was missed, or you want to avoid rebase churn | Quick, no force-push, but does NOT fix a missing trigger in the head |
**Do not** use slash-refire as a substitute for rebasing a stale head. If the
workflow YAML in the PR head does not contain `pull_request_review`, no amount
of rechecking will make auto-fire work.
---
## 5. Live-fire verification
The `test_gate_auto_fire_live.py` regression test exercises the full runtime
path: it submits an APPROVED review to a test PR and polls for the
`(pull_request_target)` status contexts. It is skipped when no API token is
available, and is intended to catch runtime non-fire that static structural
# All empty → '{}' (workspace will fail at first turn with an
# expected, actionable auth error rather than masking the test).
SECRETS_JSON='{}'
if[ -n "${E2E_MINIMAX_API_KEY:-}"];then
# Platform-managed path (E2E_LLM_PATH=platform) — the moonshot/kimi
# NOT_CONFIGURED regression (RFC#340 Fix A #2187). Molecule owns billing via the
# CP LLM proxy, so the workspace needs NO tenant key: provision with empty
# secrets and let the workspace boot purely on (a) the proxy env the control
# plane injects + (b) the manifest-derived `provider: platform` Fix A stamps into
# the generated config.yaml. This is the path that booted NOT_CONFIGURED in prod
# precisely because the BYOK branches below never exercise it. We deliberately
# skip the key-injection branches so a stray E2E_*_API_KEY in the runner env
# cannot silently convert this into a BYOK run and mask the regression.
if["${E2E_LLM_PATH:-}"="platform"];then
log " LLM path: PLATFORM-MANAGED (no tenant key; proxy + Fix A provider stamp)"
SECRETS_JSON='{}'
elif[ -n "${E2E_MINIMAX_API_KEY:-}"];then
SECRETS_JSON=$(python3 -c "
import json, os
k = os.environ['E2E_MINIMAX_API_KEY']
@@ -858,6 +870,24 @@ fi
ifecho"$AGENT_TEXT"| grep -qiE "exceeded your current quota|insufficient_quota";then
fail "A2A — PROVIDER QUOTA EXHAUSTED (NOT a platform regression). Operator action: top up MOLECULE_STAGING_OPENAI_API_KEY billing or rotate to a higher-quota org at Settings → Secrets and Variables → Actions. Tracked in #2578. Raw: $AGENT_TEXT"
fi
# Empty-completion class — the agent runtime reached the LLM and got a
# 2xx back, but the assistant turn carried NO text part (empty content,
# or tool_calls/reasoning-only with no surfaced text), so the runtime
# returns the literal "Error: message contained no text content." as its
# reply text. Steps 0-7 passing means the platform is healthy (CP up,
# break is the configured completion BACKEND returning an empty turn — a
# model/provider-side regression, NOT a workspace-server or harness bug,
# and NOT NOT_CONFIGURED (that fails earlier, at boot). Name it explicitly
# so the canary alert points at the model, not the platform: a generic
# "error-shaped response" misdirects triage to workspace-server. Observed
# 2026-06-03/04 across every staging canary on MODEL_SLUG=MiniMax-M2 (the
# canary default since #2710) — 100% on the parent's first cold turn,
# identical on main's scheduled synthetic E2E and on PRs (so it is an
# environmental backend regression, never PR-introduced).
ifecho"$AGENT_TEXT"| grep -qiF "message contained no text content";then
fail "A2A — EMPTY COMPLETION (backend regression, NOT a platform/workspace-server bug). The configured model (MODEL_SLUG=${MODEL_SLUG:-?}) returned a 2xx completion with no text part; the runtime surfaced 'message contained no text content.'. Operator action: check the staging LLM backend / proxy for the canary model (MiniMax-M2 since #2710) — empty assistant turns, not an auth/quota/boot fault. Raw: $AGENT_TEXT"
fi
# Generic catch-all — falls through if none of the known regressions hit.
log.Printf("ensureDefaultConfig: workspace %s reached provisioning with empty model — Create handler should have rejected this; rendering empty model: \"\" in config.yaml (workspace will boot not_configured)",workspaceID)
}
// Derive the provider from the providers manifest and stamp it into the
// generated config BEFORE claude-code model normalization strips the
// slash-prefix. DeriveProvider needs the FULL, un-normalized model id
// (e.g. "moonshot/kimi-k2.6") for the exact-id match that resolves the
// canvas claude-code case to provider=platform — normalizing to
// "kimi-k2.6" first would lose that match.
//
// Why this exists (RFC#340 Fix A): a canvas-created claude-code workspace
// with model "moonshot/kimi-k2.6" booted NOT_CONFIGURED — the adapter
// derived provider="moonshot" (slash-split of the model id) which is not
// in the providers registry. CP bakes `provider: platform` via heredoc,
// but the cp#329 config-bundle fetch overwrites /configs/config.yaml with
// THIS (previously providerless) bundle version, so molecule-runtime
// config.py re-derived the wrong provider. Stamping the manifest-derived
// provider here (mirroring CP's buildModelProviderYAML shape) makes the
// config the adapter reads carry the canonical provider.
//
// Reuses the SAME manifest path the config-SAVE validators use
// (providerRegistry() + Manifest.DeriveProvider; see
// model_registry_validation.go). On a derive MISS (unknown/unregistered
// model, or registry unavailable) provider is left empty and the field is
// omitted below — preserving today's behavior; never fail provisioning on
t.Fatalf("providers SSOT lists no platform models for runtime %q — the regression matrix would be empty; the SSOT shape changed (this test is the canary)",runtime)
}
// Headline sentinel: the exact id that booted NOT_CONFIGURED in prod MUST be
// in the enumerated set. If a refactor drops it from the platform arm, this
// test must still cover it explicitly — fail loud rather than silently
t.Fatalf("the headline incident model \"moonshot/kimi-k2.6\" is no longer in the claude-code platform SSOT set (%v) — regression coverage for the original bug would be lost",platformModels)
t.Fatalf("expected config.yaml in generated files for model %q",model)
}
varparsedstruct{
Modelstring`yaml:"model"`
Providerstring`yaml:"provider"`
RuntimeConfigstruct{
Modelstring`yaml:"model"`
Providerstring`yaml:"provider"`
}`yaml:"runtime_config"`
}
iferr:=yaml.Unmarshal(raw,&parsed);err!=nil{
t.Fatalf("generated YAML invalid for model %q: %v\n%s",model,err,raw)
}
// The load-bearing invariant: BOTH the top-level and the
// runtime_config provider must be exactly "platform". An empty or
// vendor-namespace ("moonshot") value here is the prod NOT_CONFIGURED
// boot — the adapter would slash-split the model id and look up an
// unregistered provider.
ifparsed.Provider!="platform"{
t.Errorf("model %q: top-level provider = %q, want \"platform\" (Fix A invariant — empty/vendor value is the NOT_CONFIGURED boot)\n%s",model,parsed.Provider,raw)
t.Fatalf("no platform models for %q in SSOT",runtime)
}
for_,model:=rangeplatformModels{
model:=model
t.Run(model,func(t*testing.T){
// nil availableAuthEnv mirrors deriveDefaultConfigProvider's call at
// config-generation time (no per-workspace auth context yet).
p,err:=m.DeriveProvider(runtime,model,nil)
iferr!=nil{
t.Fatalf("DeriveProvider(%q, %q): unexpected error %v — an SSOT-offered platform model MUST derive",runtime,model,err)
}
ifp.Name!="platform"{
t.Errorf("DeriveProvider(%q, %q).Name = %q, want \"platform\" (this is the exact slash-split-to-vendor regression that booted NOT_CONFIGURED)",runtime,model,p.Name)
}
})
}
}
// containsString is a tiny local membership helper. Kept here (not a shared
// test util) so this regression file is self-contained and can be read top to
// bottom without chasing helpers across the package.
{Name:"kimi-coding",DisplayName:"Moonshot Kimi (coding-tuned)",Protocol:"anthropic",AuthMode:"third_party_anthropic_compat",AuthEnv:[]string{"KIMI_API_KEY","ANTHROPIC_API_KEY","ANTHROPIC_AUTH_TOKEN"},ModelPrefixMatch:"^kimi-",IsPlatform:false},
@@ -89,8 +89,9 @@ var Runtimes = map[string][]GenRuntimeRef{
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.