fix(workspace-server): provision-time billing derives from EFFECTIVE model, not raw payload.Model (#1994) [BEHAVIOR-AFFECTING — CTO merge-go] #1995

Merged
hongming merged 3 commits from fix/1994-provision-billing-model-passthrough into main 2026-05-28 20:01:00 +00:00
Owner

Fixes #1994. BEHAVIOR-AFFECTING (provisioning hot path) — DO NOT MERGE without CTO merge-go. tier:medium.

Phase 1 — evidence (brief-falsification)

Brief claims (each a hypothesis until verified)

  • [H1] readWorkspaceDeriveInputs (read path) misses global_secrets, so availableAuthEnv differs between read & provision → DeriveProvider tie-breaks differently for opus.
  • [H2] The provision path builds availableAuthEnv from a different source than the read path.
  • [H3] DeriveProvider(claude-code, opus) is auth-env-SENSITIVE (ambiguous platform-anthropic vs anthropic-oauth, tie-broken by CLAUDE_CODE_OAUTH_TOKEN).
  • [H4] The circular strip (applyPlatformManagedLLMEnv + stripGlobalOriginLLMCreds) strips the global-origin oauth and the fix should MATERIALIZE that oauth (do not strip it).

Verification + evidence

  • H3 FALSIFIED (decisive). registry_gen.go: claude-code native set lists opus EXACT-ONLY in anthropic-oauth (Models: [sonnet,opus,haiku,…]); platform lists anthropic/claude-opus-4-7, anthropic-api lists claude-opus-4-7. DeriveProvider Step 3 (exact model-id match) resolves opus → anthropic-oauth deterministically, auth-env-INSENSITIVE. So availableAuthEnv is irrelevant for this model.
  • H1/H2 are real but NOT the cause. The read path (readWorkspaceDeriveInputs) does query only workspace_secrets; the provision path computes availableAuthEnvNames from the merged env (which DOES include global_secrets). A genuine latent divergence — but moot here because opus resolves by exact match before the auth-env tie-break runs.
  • ACTUAL ROOT CAUSE — model passthrough, not auth-env. applyPlatformManagedLLMEnv(... payload.Runtime, payload.Model) passes the RAW payload.Model. On a re-provision (workspace_restart.go:333 restart, :844 auto-restart, :1017 resume), the payload is rebuilt via withStoredCompute(ctx, id, CreateWorkspacePayload{Name, Tier, Runtime})withStoredCompute backfills Compute but NOT Model, so payload.Model == "". DeriveProvider(claude-code, "") errors ("model is required") → resolver defaults closed to platform_managed → bakes ANTHROPIC_BASE_URL=<platform proxy>. The READ endpoint reads MODEL from workspace_secrets (opus) → derives anthropic-oauth → byok. That is the divergence, and it is deterministic on every re-provision (explains "the canary came back identical").
  • H4 FALSIFIED. SetGlobal REJECTS bypass-list keys (incl. CLAUDE_CODE_OAUTH_TOKEN) via rejectPlatformManagedDirectLLMBypass, so an oauth in global_secrets is the PLATFORM's token (the original drain source), not the customer's. internal#711 + TestPrepareProvisionContext_ByokWithOnlyGlobalOAuthFailsClosed deliberately strip it and fail closed. Materializing it would re-introduce the drain. The customer's OWN oauth must be a workspace_secrets row (survives the provenance-aware strip). The fix must NOT preserve the global-origin oauth.
  • PM org_default vs marketing derived_provider. org_default is the RETIRED source label (internal#718 P2-B); the current resolver emits derived_default when derive fails. Marketing has a MODEL=opus workspace_secret (read path derives byok); PM resolved platform via the default-closed path (no derivable model on the read inputs at observation time). Input difference = the stored MODEL, exactly as the root cause predicts.

Live confirmation (today, SSM into i-0ad447e8433902973 = ws-tenant-reno-stars-6b66de8d)

MODEL=opus
MOLECULE_LLM_BILLING_MODE_RESOLVED=platform_managed   <- WRONG (should be byok)
ANTHROPIC_BASE_URL=https://api.moleculesai.app/api/v1/internal/llm/anthropic  <- platform proxy (the drain route)

MODEL=opus is in the container (set by applyRuntimeModelEnv's envVars["MODEL"] fallback) — proving the model WAS available in envVars, just never passed to the billing resolver.

Phase 2 — design

Single source of truth for effective model: extract effectiveModelForBilling(model, envVars) (the exact fallback chain applyRuntimeModelEnv already used: explicit → MOLECULE_MODELMODEL) and have the billing resolver consult it, so provision-path derive inputs equal read-path inputs. The stored model already lives in the merged envVars (loadWorkspaceSecrets) — no new DB query. Rollback: single-function git revert.

Phase 3 — implementation + tests

workspace-server/internal/handlers/workspace_provision.go: +effectiveModelForBilling helper; applyRuntimeModelEnv refactored to call it (DRY, keeps the two model consumers in sync — the divergence existed because they didn't share this logic); applyPlatformManagedLLMEnv derives from effectiveModelForBilling(model, envVars).

Tests (llm_billing_mode_provision_parity_test.go):

  • TestApplyPlatformManagedLLMEnv_ReProvisionUsesStoredModel — payload.Model="" + stored MODEL=opus + own (workspace-origin) oauth ⇒ byok, no platform usage-token injected, own oauth survives.
  • TestApplyPlatformManagedLLMEnv_ReadProvisionParity — read-path resolver and provision-path resolver land on the SAME mode for identical workspace inputs (core regression guard).
  • TestApplyPlatformManagedLLMEnv_DefaultPreservation — no model + no cred ⇒ platform_managed (CTO default-stays-platform).
  • TestApplyPlatformManagedLLMEnv_ByokGlobalOnlyOAuthStillFailsClosed#711 guard: a global-origin platform oauth is still STRIPPED on byok (no drain).
  • TestReProvisionPayloadOmitsModel — pins the upstream trigger (re-provision payload omits Model).

Mutation evidence: reverting the envVars fallback in effectiveModelForBilling turns ReProvision + Parity + #711-guard RED; DefaultPreservation correctly stays GREEN (no model to fall back to). Load-bearing.

Verification: go build ./... ✓ · go build -tags=integration ./... ✓ · go test -tags=integration ./... ✓ (43 pkgs ok, 0 FAIL) · full ./internal/handlers/ ✓ · golangci-lint run ./internal/handlers/ → 0 issues · gofmt clean. Stage A0 provisioner-parity is Docker-gated (CI runner exercises it; this change is orthogonal to the token/file-delivery path it covers).

Phase 4 — five-axis self-review

  • Correctness: No finding — precedence unchanged; empty-everywhere still fails closed to platform_managed; #711 strip/fail-closed unchanged; opus exact-match is deterministic. Idempotent.
  • Readability: No finding — verb-noun helper, WHY-comment, removes a duplicated chain (net simpler). Helper placed after applyRuntimeModelEnv so the preceding doc-comment stays attached.
  • Architecture: FYI — provision keys off payload.* while read keys off stored DB inputs (latent divergence surface). This fix is the correct narrowing (SSOT for model); fuller convergence is larger scope and risks the genuinely-platform path. Deferred.
  • Security: No finding — no trust boundary touched, no secret persisted; reduces the billing-leak risk; global-origin platform oauth still stripped on byok.
  • Performance: No finding — two map lookups, no new DB round-trip.

Post-merge expectation

Once merged + deployed, re-provisioning the marketing agent (6b66de8d) will resolve byok (opus → anthropic-oauth) and reach the byok branch: no platform-proxy override. The earlier-failed canary now succeeds for any workspace whose customer set their OWN oauth as a workspace_secret. Caveat: a workspace whose only oauth is the PLATFORM's global-origin token (e.g. if reno's key was only in global_secrets) will correctly hit MISSING_BYOK_CREDENTIAL (internal#711) — the customer must supply their own credential via the canvas Secrets tab. That is intended behavior, not a regression. agents-team genuinely-platform workspaces stay platform_managed.

🤖 Generated with Claude Code

Fixes #1994. **BEHAVIOR-AFFECTING (provisioning hot path) — DO NOT MERGE without CTO merge-go.** `tier:medium`. ## Phase 1 — evidence (brief-falsification) ### Brief claims (each a hypothesis until verified) - [H1] `readWorkspaceDeriveInputs` (read path) misses `global_secrets`, so `availableAuthEnv` differs between read & provision → `DeriveProvider` tie-breaks differently for `opus`. - [H2] The provision path builds `availableAuthEnv` from a different source than the read path. - [H3] `DeriveProvider(claude-code, opus)` is auth-env-SENSITIVE (ambiguous platform-anthropic vs anthropic-oauth, tie-broken by `CLAUDE_CODE_OAUTH_TOKEN`). - [H4] The circular strip (`applyPlatformManagedLLMEnv` + `stripGlobalOriginLLMCreds`) strips the global-origin oauth and the fix should MATERIALIZE that oauth (do not strip it). ### Verification + evidence - **H3 FALSIFIED (decisive).** registry_gen.go: `claude-code` native set lists `opus` EXACT-ONLY in `anthropic-oauth` (`Models: [sonnet,opus,haiku,…]`); `platform` lists `anthropic/claude-opus-4-7`, `anthropic-api` lists `claude-opus-4-7`. `DeriveProvider` Step 3 (exact model-id match) resolves `opus → anthropic-oauth` **deterministically, auth-env-INSENSITIVE**. So `availableAuthEnv` is irrelevant for this model. - **H1/H2 are real but NOT the cause.** The read path (`readWorkspaceDeriveInputs`) does query only `workspace_secrets`; the provision path computes `availableAuthEnvNames` from the merged env (which DOES include `global_secrets`). A genuine latent divergence — but moot here because `opus` resolves by exact match before the auth-env tie-break runs. - **ACTUAL ROOT CAUSE — model passthrough, not auth-env.** `applyPlatformManagedLLMEnv(... payload.Runtime, payload.Model)` passes the RAW `payload.Model`. On a **re-provision** (`workspace_restart.go:333` restart, `:844` auto-restart, `:1017` resume), the payload is rebuilt via `withStoredCompute(ctx, id, CreateWorkspacePayload{Name, Tier, Runtime})` — `withStoredCompute` backfills `Compute` but **NOT `Model`**, so `payload.Model == ""`. `DeriveProvider(claude-code, "")` errors ("model is required") → resolver defaults closed to `platform_managed` → bakes `ANTHROPIC_BASE_URL=<platform proxy>`. The READ endpoint reads `MODEL` from `workspace_secrets` (`opus`) → derives `anthropic-oauth` → byok. **That is the divergence**, and it is deterministic on every re-provision (explains "the canary came back identical"). - **H4 FALSIFIED.** `SetGlobal` REJECTS bypass-list keys (incl. `CLAUDE_CODE_OAUTH_TOKEN`) via `rejectPlatformManagedDirectLLMBypass`, so an oauth in `global_secrets` is the PLATFORM's token (the original drain source), not the customer's. internal#711 + `TestPrepareProvisionContext_ByokWithOnlyGlobalOAuthFailsClosed` deliberately strip it and fail closed. Materializing it would re-introduce the drain. The customer's OWN oauth must be a `workspace_secrets` row (survives the provenance-aware strip). **The fix must NOT preserve the global-origin oauth.** - **PM `org_default` vs marketing `derived_provider`.** `org_default` is the RETIRED source label (internal#718 P2-B); the current resolver emits `derived_default` when derive fails. Marketing has a `MODEL=opus` workspace_secret (read path derives byok); PM resolved platform via the default-closed path (no derivable model on the read inputs at observation time). Input difference = the stored MODEL, exactly as the root cause predicts. ### Live confirmation (today, SSM into `i-0ad447e8433902973` = ws-tenant-reno-stars-6b66de8d) ``` MODEL=opus MOLECULE_LLM_BILLING_MODE_RESOLVED=platform_managed <- WRONG (should be byok) ANTHROPIC_BASE_URL=https://api.moleculesai.app/api/v1/internal/llm/anthropic <- platform proxy (the drain route) ``` `MODEL=opus` is in the container (set by `applyRuntimeModelEnv`'s `envVars["MODEL"]` fallback) — proving the model WAS available in `envVars`, just never passed to the billing resolver. ## Phase 2 — design Single source of truth for *effective model*: extract `effectiveModelForBilling(model, envVars)` (the exact fallback chain `applyRuntimeModelEnv` already used: explicit → `MOLECULE_MODEL` → `MODEL`) and have the billing resolver consult it, so provision-path derive inputs equal read-path inputs. The stored model already lives in the merged `envVars` (`loadWorkspaceSecrets`) — no new DB query. Rollback: single-function `git revert`. ## Phase 3 — implementation + tests `workspace-server/internal/handlers/workspace_provision.go`: +`effectiveModelForBilling` helper; `applyRuntimeModelEnv` refactored to call it (DRY, keeps the two model consumers in sync — the divergence existed because they didn't share this logic); `applyPlatformManagedLLMEnv` derives from `effectiveModelForBilling(model, envVars)`. Tests (`llm_billing_mode_provision_parity_test.go`): - `TestApplyPlatformManagedLLMEnv_ReProvisionUsesStoredModel` — payload.Model="" + stored MODEL=opus + own (workspace-origin) oauth ⇒ byok, no platform usage-token injected, own oauth survives. - `TestApplyPlatformManagedLLMEnv_ReadProvisionParity` — read-path resolver and provision-path resolver land on the SAME mode for identical workspace inputs (core regression guard). - `TestApplyPlatformManagedLLMEnv_DefaultPreservation` — no model + no cred ⇒ platform_managed (CTO default-stays-platform). - `TestApplyPlatformManagedLLMEnv_ByokGlobalOnlyOAuthStillFailsClosed` — #711 guard: a global-origin platform oauth is still STRIPPED on byok (no drain). - `TestReProvisionPayloadOmitsModel` — pins the upstream trigger (re-provision payload omits Model). **Mutation evidence:** reverting the `envVars` fallback in `effectiveModelForBilling` turns ReProvision + Parity + #711-guard RED; DefaultPreservation correctly stays GREEN (no model to fall back to). Load-bearing. **Verification:** `go build ./...` ✓ · `go build -tags=integration ./...` ✓ · `go test -tags=integration ./...` ✓ (43 pkgs ok, 0 FAIL) · full `./internal/handlers/` ✓ · `golangci-lint run ./internal/handlers/` → 0 issues · `gofmt` clean. Stage A0 provisioner-parity is Docker-gated (CI runner exercises it; this change is orthogonal to the token/file-delivery path it covers). ## Phase 4 — five-axis self-review - **Correctness:** No finding — precedence unchanged; empty-everywhere still fails closed to platform_managed; #711 strip/fail-closed unchanged; `opus` exact-match is deterministic. Idempotent. - **Readability:** No finding — verb-noun helper, WHY-comment, removes a duplicated chain (net simpler). Helper placed after `applyRuntimeModelEnv` so the preceding doc-comment stays attached. - **Architecture:** FYI — provision keys off `payload.*` while read keys off stored DB inputs (latent divergence surface). This fix is the correct narrowing (SSOT for model); fuller convergence is larger scope and risks the genuinely-platform path. Deferred. - **Security:** No finding — no trust boundary touched, no secret persisted; reduces the billing-leak risk; global-origin platform oauth still stripped on byok. - **Performance:** No finding — two map lookups, no new DB round-trip. ## Post-merge expectation Once merged + deployed, re-provisioning the marketing agent (`6b66de8d`) will resolve **byok** (opus → anthropic-oauth) and reach the byok branch: no platform-proxy override. The earlier-failed canary now succeeds for any workspace whose customer set their OWN oauth as a `workspace_secret`. **Caveat:** a workspace whose only oauth is the PLATFORM's global-origin token (e.g. if reno's key was only in `global_secrets`) will correctly hit `MISSING_BYOK_CREDENTIAL` (internal#711) — the customer must supply their own credential via the canvas Secrets tab. That is intended behavior, not a regression. agents-team genuinely-platform workspaces stay platform_managed. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming added 1 commit 2026-05-28 18:46:58 +00:00
fix(workspace-server): provision-time billing derives from EFFECTIVE model, not raw payload.Model (molecule-core#1994)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 9s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 8s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 45s
gate-check-v3 / gate-check (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 10s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 39s
security-review / approved (pull_request) Failing after 5s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m7s
E2E Chat / E2E Chat (pull_request) Successful in 23s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m2s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Harness Replays / Harness Replays (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m37s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Successful in 5m10s
CI / Platform (Go) (pull_request) Successful in 5m54s
CI / all-required (pull_request) Successful in 7m19s
442f79a987
The provision-time LLM billing resolver diverged from the read endpoint:
a byok workspace (claude-code, opus) was provisioned platform_managed and
routed through the platform LLM proxy, billing the platform Anthropic key
for the customer own usage (Reno Stars Marketing 6b66de8d; live-confirmed
2026-05-28).

Root cause: applyPlatformManagedLLMEnv passed the RAW payload.Model to
ResolveLLMBillingModeDerived. On a re-provision (restart/resume/
auto-restart) the payload is rebuilt from the DB with Name+Tier+Runtime
only (workspace_restart.go:333/844/1017 via withStoredCompute, which
backfills Compute but NOT Model), so payload.Model == "". DeriveProvider
errors on an empty model, the resolver defaults closed to platform_managed
and bakes ANTHROPIC_BASE_URL=<platform proxy>. The read endpoint
(ResolveLLMBillingMode -> readWorkspaceDeriveInputs) reads MODEL from
workspace_secrets, derives opus -> anthropic-oauth -> byok. Divergence,
deterministic on every re-provision.

Fix: extract effectiveModelForBilling (the fallback chain
applyRuntimeModelEnv already used: explicit -> MOLECULE_MODEL -> MODEL)
into a shared helper and have the billing resolver consult it, so the
provision-path derive inputs match the read-path. The stored model already
lives in the merged envVars (loadWorkspaceSecrets) — no new DB query. The
byok branch (no proxy override; strip only global-origin platform creds;
fail-closed on missing own cred, internal#711) is preserved unchanged;
genuinely-platform and no-model workspaces still default platform_managed
(CTO: default stays platform).

Tests (mutation-load-bearing): re-provision-uses-stored-model byok repro,
read/provision parity guard, default-preservation, and the #711 global-
only-oauth fail-closed guard. Reverting the envVars fallback turns the
repro + parity + #711 tests RED; default-preservation stays GREEN.

BEHAVIOR-AFFECTING (provisioning hot path) — needs CTO merge-go.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hongming added the tier:medium label 2026-05-28 18:47:18 +00:00
Author
Owner

NOT MERGED - needs CTO merge-go. Behavior-affecting (provisioning hot path / LLM billing). tier:medium. Two non-author approvals + green required CI per BP, plus explicit CTO go before merge. Merge with a merge COMMIT (never squash).

**NOT MERGED - needs CTO merge-go.** Behavior-affecting (provisioning hot path / LLM billing). `tier:medium`. Two non-author approvals + green required CI per BP, plus explicit CTO go before merge. Merge with a merge COMMIT (never squash).
hongming added 1 commit 2026-05-28 18:58:20 +00:00
test(e2e): add real-completion + per-provider liveness + byok-routing A2A gate (#1994 follow-on)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 11s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 22s
CI / Python Lint & Test (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 38s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
E2E Chat / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 12s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 34s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m7s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m14s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m11s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 8s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m24s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
gate-check-v3 / gate-check (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 8s
security-review / approved (pull_request) Failing after 12s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 27s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 6s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m25s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 4m31s
CI / Canvas (Next.js) (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 17s
E2E Chat / E2E Chat (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m12s
Harness Replays / Harness Replays (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m16s
CI / Platform (Go) (pull_request) Successful in 6m10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 13m28s
3269e93216
The A2A e2e historically asserted only response SHAPE (test_a2a_e2e.sh
checked '"kind":"text"' only). A broken agent returns its error AS a
text part -- {"kind":"text","text":"Agent error (Exception) ..."} --
which STILL matches the shape check, so it PASSED on a fully broken
agent. That is why the 2026-05-2x drained-key / byok-misroute failures
(agents-team PM + reno marketing erroring on every LLM call) sailed
through CI. "Channel returns text shape" is not "agent completed an LLM
round-trip."

Adds, ADDITIVELY (no existing assertion weakened or removed):

- tests/e2e/lib/completion_assert.sh -- reusable gates:
  * a2a_assert_real_completion: deterministic known-answer round-trip;
    asserts CONTAINS the expected token AND NOT an error-as-text marker
    (Agent error / Exception / error result / MISSING_BYOK_CREDENTIAL).
  * provider_liveness_matrix + offered_platform_models_for_runtime:
    per-offered-provider cheap (max_tokens:4) probe; the offered set is
    read from the providers.yaml SSOT (runtimes.<rt>.providers[platform]
    .models) -- not a hardcoded list -- so the matrix tracks the SSOT.
  * assert_byok_not_platform_proxy: #1994 regression guard -- a
    byok-resolving workspace must NOT resolve platform_managed (reads the
    same derived resolver GET /admin/workspaces/:id/llm-billing-mode the
    provision strip gate uses).

- tests/e2e/test_staging_full_saas.sh (the live-agent lane, MiniMax
  primary): new stanzas 8b (PINEAPPLE known-answer, the core gate),
  8c (byok-routing guard), 8d (SSOT-driven per-provider liveness matrix).

- tests/e2e/test_a2a_e2e.sh: added check_no_error_as_text on Echo + SEO
  replies so the brief's literal shape-only example now FAILS on an
  error-as-text payload.

- tests/e2e/test_completion_assert_unit.sh: offline fail-direction proof
  (16 cases) that the negative gates are load-bearing -- error-as-text
  MUST fail, platform_managed MUST trip the #1994 guard. Wired into
  ci.yml "Run E2E bash unit tests (no live infra)" (required, per-PR +
  main). e2e-staging-saas.yml paths filter extended to re-trigger the
  live lane on lib changes.

No #1994 fix code touched -- tests/e2e + workflow wiring only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
Owner

Added: comprehensive real-completion A2A e2e (same PR — fix + e2e ship as one unit)

Pushed 3269e932 onto this branch. Tests/e2e + workflow wiring only — the #1994 fix code (workspace_provision.go + parity tests) is untouched.

Why this e2e (the gap it closes)

The A2A e2e historically asserted only response shapetest_a2a_e2e.sh: check "...has text" '"kind":"text"'. A broken agent returns its error as a text part:

{"kind":"text","text":"Agent error (Exception) — see workspace logs for details."}

which STILL matches "kind":"text" → the shape check passed on a fully broken agent. That is exactly why the 2026-05-2x drained-key / byok-misroute failures (agents-team PM + reno marketing erroring on every LLM call) sailed through CI. "Channel returns text shape" ≠ "agent completed an LLM round-trip."

What was added (additive — no existing assertion weakened)

  1. tests/e2e/lib/completion_assert.sh — reusable gates:

    • a2a_assert_real_completion — deterministic known-answer round-trip: asserts the reply CONTAINS the expected token (case-insensitive) AND NOT any error-as-text marker (Agent error / Exception / error result / MISSING_BYOK_CREDENTIAL).
    • provider_liveness_matrix + offered_platform_models_for_runtime — per-offered-provider cheap (max_tokens:4) probe; the offered set is read from the providers.yaml SSOT (runtimes.<runtime>.providers[platform].models), not a hardcoded list, so the matrix tracks the SSOT automatically. Unconfigured-in-lane providers are SKIP (rc=75), not FAIL — deterministic + low-flake.
    • assert_byok_not_platform_proxy#1994 regression guard: a byok-resolving workspace must NOT resolve platform_managed. Reads the same derived resolver the provision-time strip gate uses (GET /admin/workspaces/:id/llm-billing-moderesolved_mode/provider_selection). A regression of #1994 (byok→platform-proxy) flips resolved_mode to platform_managed and trips this RED.
  2. tests/e2e/test_staging_full_saas.sh (the live-agent lane — provisions a real staging org, MiniMax-primary key): new stanzas after the existing parent A2A check —

    • 8b PINEAPPLE known-answer round-trip (the core gate; the one place we spend a non-trivial token budget, on the cheap MiniMax backend; retry-once on transient 502/503/504, never on agent-error).
    • 8c byok-routing guard (only when the parent carries its own vendor key; the OpenAI/no-key path is legitimately platform_managed).
    • 8d SSOT-driven per-provider liveness matrix (logs the pass/SKIP/FAIL matrix).
  3. tests/e2e/test_a2a_e2e.sh — added check_no_error_as_text on the Echo + SEO replies, so the brief's literal shape-only example now FAILS on an error-as-text payload (the existing "kind":"text" shape checks stay).

  4. tests/e2e/test_completion_assert_unit.sh — offline fail-direction proof (16 cases, no LLM/network) that the negative gates are load-bearing: an error-as-text payload MUST FAIL the real-completion gate; a platform_managed resolution MUST TRIP the #1994 guard.

CI wiring

  • test_completion_assert_unit.sh runs in ci.yml → "Run E2E bash unit tests (no live infra)" (alongside test_model_slug.sh) — the required Shellcheck (E2E scripts) job, on push + PR to main/staging. So a refactor that weakens the gate to a shape check goes red on every PR, fast lane, no live infra.
  • e2e-staging-saas.yml paths filter extended with tests/e2e/lib/completion_assert.sh so a lib change re-triggers the live MiniMax-backed lane (push-to-main + nightly + provisioning-critical PRs).

Evidence the negative assertion is load-bearing

test_completion_assert_unit.sh: 16 passed, 0 failed, including the decisive cases — Agent error (Exception) ... rejected, MISSING_BYOK_CREDENTIAL rejected, error-as-text that also contains the token rejected, and resolved_mode=platform_managed tripping the #1994 guard. bash -n clean + shellcheck --severity=warning clean on all touched scripts; SSOT matrix read verified through the lib helper (returns the claude-code platform set: anthropic/claude-opus-4-7, anthropic/claude-sonnet-4-6, moonshot/kimi-k2.6, moonshot/kimi-k2.5, minimax/MiniMax-M2.7, minimax/MiniMax-M2.7-highspeed).

🤖 Generated with Claude Code

## Added: comprehensive real-completion A2A e2e (same PR — fix + e2e ship as one unit) Pushed `3269e932` onto this branch. **Tests/e2e + workflow wiring only — the #1994 fix code (`workspace_provision.go` + parity tests) is untouched.** ### Why this e2e (the gap it closes) The A2A e2e historically asserted only response **shape** — `test_a2a_e2e.sh`: `check "...has text" '"kind":"text"'`. A broken agent returns its error **as** a text part: ```json {"kind":"text","text":"Agent error (Exception) — see workspace logs for details."} ``` which STILL matches `"kind":"text"` → the shape check **passed on a fully broken agent**. That is exactly why the 2026-05-2x drained-key / byok-misroute failures (agents-team PM + reno marketing erroring on every LLM call) sailed through CI. "Channel returns text shape" ≠ "agent completed an LLM round-trip." ### What was added (additive — no existing assertion weakened) 1. **`tests/e2e/lib/completion_assert.sh`** — reusable gates: - `a2a_assert_real_completion` — deterministic known-answer round-trip: asserts the reply **CONTAINS** the expected token (case-insensitive) **AND NOT** any error-as-text marker (`Agent error` / `Exception` / `error result` / `MISSING_BYOK_CREDENTIAL`). - `provider_liveness_matrix` + `offered_platform_models_for_runtime` — per-offered-provider cheap (`max_tokens:4`) probe; the offered set is read from the **`providers.yaml` SSOT** (`runtimes.<runtime>.providers[platform].models`), not a hardcoded list, so the matrix tracks the SSOT automatically. Unconfigured-in-lane providers are `SKIP` (rc=75), not FAIL — deterministic + low-flake. - `assert_byok_not_platform_proxy` — **#1994 regression guard**: a byok-resolving workspace must NOT resolve `platform_managed`. Reads the **same derived resolver** the provision-time strip gate uses (`GET /admin/workspaces/:id/llm-billing-mode` → `resolved_mode`/`provider_selection`). A regression of #1994 (byok→platform-proxy) flips `resolved_mode` to `platform_managed` and trips this RED. 2. **`tests/e2e/test_staging_full_saas.sh`** (the live-agent lane — provisions a real staging org, MiniMax-primary key): new stanzas after the existing parent A2A check — - **8b** PINEAPPLE known-answer round-trip (the core gate; the one place we spend a non-trivial token budget, on the cheap MiniMax backend; retry-once on transient 502/503/504, never on agent-error). - **8c** byok-routing guard (only when the parent carries its own vendor key; the OpenAI/no-key path is legitimately `platform_managed`). - **8d** SSOT-driven per-provider liveness matrix (logs the pass/SKIP/FAIL matrix). 3. **`tests/e2e/test_a2a_e2e.sh`** — added `check_no_error_as_text` on the Echo + SEO replies, so the brief's literal shape-only example now FAILS on an error-as-text payload (the existing `"kind":"text"` shape checks stay). 4. **`tests/e2e/test_completion_assert_unit.sh`** — offline fail-direction proof (16 cases, no LLM/network) that the negative gates are **load-bearing**: an error-as-text payload **MUST FAIL** the real-completion gate; a `platform_managed` resolution **MUST TRIP** the #1994 guard. ### CI wiring - `test_completion_assert_unit.sh` runs in **`ci.yml` → "Run E2E bash unit tests (no live infra)"** (alongside `test_model_slug.sh`) — the **required Shellcheck (E2E scripts)** job, on **push + PR to main/staging**. So a refactor that weakens the gate to a shape check goes red on every PR, fast lane, no live infra. - `e2e-staging-saas.yml` paths filter extended with `tests/e2e/lib/completion_assert.sh` so a lib change re-triggers the live MiniMax-backed lane (push-to-main + nightly + provisioning-critical PRs). ### Evidence the negative assertion is load-bearing `test_completion_assert_unit.sh`: **16 passed, 0 failed**, including the decisive cases — `Agent error (Exception) ...` rejected, `MISSING_BYOK_CREDENTIAL` rejected, error-as-text that *also* contains the token rejected, and `resolved_mode=platform_managed` tripping the #1994 guard. `bash -n` clean + `shellcheck --severity=warning` clean on all touched scripts; SSOT matrix read verified through the lib helper (returns the claude-code platform set: `anthropic/claude-opus-4-7`, `anthropic/claude-sonnet-4-6`, `moonshot/kimi-k2.6`, `moonshot/kimi-k2.5`, `minimax/MiniMax-M2.7`, `minimax/MiniMax-M2.7-highspeed`). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming added 1 commit 2026-05-28 19:46:02 +00:00
fix(workspace-server): byok runs on the tenant's own global-scope LLM cred; stop stripping it (molecule-core#1994)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 8s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 12s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 33s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 11s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m1s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m13s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m11s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 56s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m29s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 33s
gate-check-v3 / gate-check (pull_request) Successful in 3s
qa-review / approved (pull_request) Failing after 4s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m26s
security-review / approved (pull_request) Failing after 5s
sop-checklist / review-refire (pull_request) Has been skipped
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 5s
sop-tier-check / tier-check (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
E2E Chat / E2E Chat (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 5m23s
Harness Replays / Harness Replays (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m18s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m27s
CI / Platform (Go) (pull_request) Successful in 6m4s
CI / all-required (pull_request) Successful in 9m54s
audit-force-merge / audit (pull_request) Successful in 8s
bbb445b956
Corrected-model credential fix (CTO-confirmed). `global_secrets` is the
TENANT's own secret store (shared across that tenant's workspaces), NOT the
platform's. The platform's own LLM credential is the CP proxy usage token,
injected separately on the platform_managed path; it is never stored in a
tenant's global_secrets.

The internal#711 provider-aware strip rested on the inverted premise that a
global-scope LLM credential was "the platform's own". On the byok/disabled
branch it stripped the tenant's OWN oauth when that oauth lived at global
scope, leaving the workspace credential-less -> MISSING_BYOK_CREDENTIAL ->
dead (Reno Stars Marketing/SEO byok agents, live-confirmed 2026-05-28).

Changes:
- workspace_provision.go: remove the stripGlobalOriginLLMCreds call on the
  byok/disabled branch; delete the now-dead function; drop the unused
  globalKeys parameter from applyPlatformManagedLLMEnv.
- secrets.go: remove the symmetric byok strip on the remote-pull path
  (GET /workspaces/:id/secrets/values) + its now-unused globalKeys tracking;
  the bundle is the tenant's merged secrets served verbatim.
- platform_managed path UNCHANGED: still strips direct oauth + forces the CP
  proxy usage token (metered). Only byok/disabled stop being stripped.
- Fail-closed UNCHANGED in spirit: a byok workspace with no LLM credential at
  ANY scope still aborts MISSING_BYOK_CREDENTIAL; the trigger narrowed from
  "no workspace-scoped cred" to "no cred at any scope".

Guard (co-mingling prevention at the write boundary):
- SetGlobal still rejects bypass-list keys for a platform_managed tenant
  (keeps a platform-shaped credential out of global_secrets going forward);
  added a regression test pinning it.

Tests: inverted the strip-asserting unit + e2e tests to the corrected model
(global-scope oauth survives, byok runs direct, no proxy); added genuinely-
credential-less byok fail-closed coverage; all three behavior changes are
mutation-load-bearing (re-adding either strip / dropping the SetGlobal guard
turns the respective test RED). build + vet + golangci-lint + the full
integration-tagged handlers suite green. The #1994 model-passthrough fix and
the MiniMax A2A e2e on this branch are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
Owner

Third piece added: clean-SSOT credential fix (molecule-core#1994 credential follow-on)

Head is now bbb445b9 (was 3269e932). This commit is additive to the #1994 model-passthrough fix + the MiniMax A2A e2e already on this branch — both untouched.

What changed (behavior)

  • byok / disabled: no longer strips the tenant's global-origin LLM creds. global_secrets is the tenant's own store (CTO-confirmed), so the tenant's own oauth — at global OR workspace scope — is exactly what byok runs on, direct. Removed the strip in workspace_provision.go::applyPlatformManagedLLMEnv (deleted the dead stripGlobalOriginLLMCreds + the unused globalKeys param) and the symmetric strip on the remote-pull path in secrets.go::Values.
  • platform_managed: UNCHANGED — still strips direct oauth + forces the CP proxy usage token (metered). stripPlatformManagedLLMBypassEnv untouched.
  • fail-closed: a byok workspace with no LLM cred at ANY scope still aborts MISSING_BYOK_CREDENTIAL (trigger narrowed from "no workspace-scoped cred" to "no cred at any scope").
  • guard: SetGlobal still rejects bypass-list keys for a platform_managed tenant (keeps a platform-shaped credential out of global_secrets going forward); added a regression test.

Tests: strip-asserting unit + e2e tests inverted to the corrected model; all three behavior changes are mutation-load-bearing (re-adding either strip / dropping the SetGlobal guard → respective test RED, verified). go build, go vet, golangci-lint (0 issues), and the full -tags=integration handlers suite are green.


⚠️ Phase-1 live audit — co-mingling that needs CTO cleanup BEFORE merge

Removing the strip is safe for the oauth credential everywhere (each tenant's CLAUDE_CODE_OAUTH_TOKEN is distinct — verified by plaintext-hash, no shared/platform value). But there is one platform-origin credential co-mingled into three tenants' global_secrets that WILL re-drain once the strip is gone:

MINIMAX_API_KEY is byte-identical across chloe-dong, agents-team, hongming (plaintext-sha20 2544574e…, len 125) and matches the platform/operator key in /etc/molecule-bootstrap/personas/triage-operator/env on the op-host. This is the JRS-style "platform-keyed secret-reuse via DB copy" recreation pattern — a shared operator MiniMax key planted into tenant global_secrets, not the tenant's own.

Exposure: agents-team has explicit byok workspaces incl. "Dev Engineer B (MiniMax)". With the strip removed, that workspace would run MiniMax directly on the platform/operator's key (billing the platform), instead of the strip masking it.

Origin in code: the only in-core writers to global_secrets are SetGlobal (tenant-driven, guarded) and main.go::fixAdminTokenPlaceholder (ADMIN_TOKEN only) — neither plants this key. The shared MiniMax value was injected by an out-of-core recreation/secret-reuse flow (operator-side DB copy). No core handler reproduces it, so it's outside this PR's scope to fix in code.

Requested CTO action before merge: rotate/replace the shared MINIMAX_API_KEY in the chloe-dong / agents-team / hongming global_secrets with each tenant's OWN MiniMax key (or remove it where unused). The other co-mingled-key audit came back clean: all oauth + KIMI values are per-tenant distinct; jrs-auto and go-to-market have no bypass-list globals at all. (No tenant data was mutated by this audit — read-only.)

## Third piece added: clean-SSOT credential fix (molecule-core#1994 credential follow-on) Head is now `bbb445b9` (was `3269e932`). This commit is **additive** to the #1994 model-passthrough fix + the MiniMax A2A e2e already on this branch — both untouched. ### What changed (behavior) - **byok / disabled**: no longer strips the tenant's global-origin LLM creds. `global_secrets` is the **tenant's own** store (CTO-confirmed), so the tenant's own oauth — at global OR workspace scope — is exactly what byok runs on, direct. Removed the strip in `workspace_provision.go::applyPlatformManagedLLMEnv` (deleted the dead `stripGlobalOriginLLMCreds` + the unused `globalKeys` param) and the symmetric strip on the remote-pull path in `secrets.go::Values`. - **platform_managed**: UNCHANGED — still strips direct oauth + forces the CP proxy usage token (metered). `stripPlatformManagedLLMBypassEnv` untouched. - **fail-closed**: a byok workspace with no LLM cred at ANY scope still aborts `MISSING_BYOK_CREDENTIAL` (trigger narrowed from "no workspace-scoped cred" to "no cred at any scope"). - **guard**: `SetGlobal` still rejects bypass-list keys for a platform_managed tenant (keeps a platform-shaped credential out of global_secrets going forward); added a regression test. Tests: strip-asserting unit + e2e tests inverted to the corrected model; all three behavior changes are mutation-load-bearing (re-adding either strip / dropping the SetGlobal guard → respective test RED, verified). `go build`, `go vet`, `golangci-lint` (0 issues), and the full `-tags=integration` handlers suite are green. --- ## ⚠️ Phase-1 live audit — co-mingling that needs CTO cleanup BEFORE merge Removing the strip is safe for the **oauth** credential everywhere (each tenant's `CLAUDE_CODE_OAUTH_TOKEN` is distinct — verified by plaintext-hash, no shared/platform value). **But** there is one **platform-origin credential co-mingled into three tenants' `global_secrets`** that WILL re-drain once the strip is gone: **`MINIMAX_API_KEY` is byte-identical across `chloe-dong`, `agents-team`, `hongming`** (plaintext-sha20 `2544574e…`, len 125) **and matches the platform/operator key in `/etc/molecule-bootstrap/personas/triage-operator/env`** on the op-host. This is the JRS-style "platform-keyed secret-reuse via DB copy" recreation pattern — a shared operator MiniMax key planted into tenant global_secrets, not the tenant's own. Exposure: `agents-team` has explicit **byok** workspaces incl. "Dev Engineer B (MiniMax)". With the strip removed, that workspace would run MiniMax directly on the **platform/operator's** key (billing the platform), instead of the strip masking it. Origin in code: the only in-core writers to `global_secrets` are `SetGlobal` (tenant-driven, guarded) and `main.go::fixAdminTokenPlaceholder` (ADMIN_TOKEN only) — **neither plants this key**. The shared MiniMax value was injected by an out-of-core recreation/secret-reuse flow (operator-side DB copy). No core handler reproduces it, so it's outside this PR's scope to fix in code. **Requested CTO action before merge:** rotate/replace the shared `MINIMAX_API_KEY` in the `chloe-dong` / `agents-team` / `hongming` `global_secrets` with each tenant's OWN MiniMax key (or remove it where unused). The other co-mingled-key audit came back clean: all oauth + KIMI values are per-tenant distinct; `jrs-auto` and `go-to-market` have no bypass-list globals at all. (No tenant data was mutated by this audit — read-only.)
agent-reviewer approved these changes 2026-05-28 19:53:31 +00:00
agent-reviewer left a comment
Member

APPROVE — independent falsifying Five-Axis review (built + ran on op-host; HEAD bbb445b9). Merge is GATED on the MiniMax-key rotation (see Security axis); do NOT merge until that pre-merge cleanup is done.

1. Correctness — PASS. (a) effectiveModelForBilling(model, envVars) is the shared SSOT (explicit → MOLECULE_MODELMODEL); applyRuntimeModelEnv now calls it (DRY, no divergence). (b) byok branch leaves tenant creds intact, no proxy override, routes direct. (c) platform_managed branch byte-identical — stripPlatformManagedLLMBypassEnv body + the CP-proxy MOLECULE_LLM_USAGE_TOKEN/ANTHROPIC_BASE_URL injection untouched (diff confirms only comments + the byok-branch above it changed). (d) Dead-code removal clean: stripGlobalOriginLLMCreds deleted, globalKeys param dropped from applyPlatformManagedLLMEnv; globalSecretKeys is still legitimately consumed by findForbiddenTenantEnvKeysFromGlobals (not orphaned). go build ./... + go build -tags=integration ./... + go vet all green → compiler-verified no orphans.

2. Non-regression — PASS. platform_managed metering path unchanged. ReProvisionUsesStoredModel + ReadProvisionParity (#1994 guards) pass. e2e additions are purely additive — diff of test_a2a_e2e.sh shows only + lines; the existing "kind":"text" shape checks remain. DefaultPreservation (no model + no cred ⇒ platform_managed) holds.

3. Tests — PASS. go test -tags=integration ./internal/handlers/ GREEN (16.5s). Offline e2e test_completion_assert_unit.sh 16/16, shellcheck clean, wired into the required "Run E2E bash unit tests (no live infra)" ci.yml job. Mutation checks (both performed, both load-bearing):

  • (i) Re-added the byok-branch global strip → ByokGlobalScopeOAuthSurvives(AndRunsDirect), ReProvisionUsesStoredModel, ByokWithTenantGlobalOAuthSucceeds went RED (global-scope oauth stripped → MISSING_BYOK_CREDENTIAL / HasUsableLLMCred=false), exactly the predicted direction. DefaultPreservation correctly stayed GREEN.
  • (ii) Dropped the rejectPlatformManagedDirectLLMBypass guard from SetGlobalTestSetGlobal_RejectsPlatformBypassKeyOnPlatformManagedTenant went RED (got 500 from the unguarded INSERT instead of the 400 reject).
    Restored to clean head + rebuilt green after each.

4. Security / credential-handling — the crux, and the corrected model checks out under the org-per-EC2 architecture. global_secrets (migration 012: key TEXT UNIQUE, one row-set per workspace-server DB) is a single store per EC2, and each EC2 hosts exactly one tenant → within an instance global_secrets IS that tenant's own shared store. So removing the byok strip is the right model: byok runs on the tenant's own cred at global OR workspace scope, direct.

  • The co-mingling risk is REAL. With the strip gone, a byok-MiniMax workspace in agents-team ("Dev Engineer B (MiniMax)") would run on the platform/operator MiniMax key direct — the operator key in triage-operator/env is confirmed present on the op-host; the PR audit asserts it is byte-identical (sha20 2544574e…, len 125) in chloe-dong/agents-team/hongming global_secrets (JRS-style secret-reuse-via-DB-copy recreation pattern).
  • The PR handles this correctly: it does NOT silently enable the drain — it surfaces the audit and requests CTO rotation BEFORE merge, and adds the SetGlobal reject guard to stop FUTURE in-code co-mingling.
  • Falsifying "the guard is complete": the guard is NOT a complete barrier, and the PR does not overclaim it is. (a) It gates on the SERVER-env MOLECULE_LLM_BILLING_MODE=platform_managed (org default); on a byok-default tenant SetGlobal legitimately ALLOWS a bypass-list key (correct — it's the tenant's own cred). (b) The actual co-mingled MiniMax key was planted by an out-of-core operator-side DB copy (recreation/clone/seed flow), which NO in-core writer reproduces and which the guard cannot intercept. In-core writers are only SetGlobal (guarded) + fixAdminTokenPlaceholder (ADMIN_TOKEN only). So the guard prevents future in-core co-mingling but the recreation/seed vector remains out-of-band — that is precisely why the existing co-mingled key must be rotated as a data-cleanup, not assumed fixed by code.

5. Tier/merge-gate — behavior-affecting credential + provisioning hot path. There IS a pre-merge BLOCKER: the shared MINIMAX_API_KEY in chloe-dong/agents-team/hongming global_secrets must be rotated/replaced with each tenant's OWN key (or removed where unused) BEFORE this merges. Reason: removing the byok strip is safe for every credential that is genuinely the tenant's own (all oauth + KIMI verified per-tenant-distinct in the audit), but the one platform-origin co-mingled MiniMax key would route a byok workspace onto the platform's key the moment the strip is gone. The code is correct; the data is not yet clean.

CTO-flag items:

  1. BLOCKER before merge: rotate the shared MINIMAX_API_KEY out of the three tenants' global_secrets (per-tenant key or removal). Verify agents-team "Dev Engineer B (MiniMax)" specifically before any re-provision.
  2. Follow-up (not blocking): the recreation/clone/seed flow that planted the shared key is out-of-core and ungated — consider an operator-side guard or audit so future tenant recreation can't re-co-mingle a platform key (the SetGlobal guard does not cover that path).

Verdict: code is correct + well-tested; merge GATED on the MiniMax rotation + explicit CTO merge-go. Not merging.

— agent-reviewer (independent build/test/mutation on op-host)

**APPROVE** — independent falsifying Five-Axis review (built + ran on op-host; HEAD bbb445b9). **Merge is GATED on the MiniMax-key rotation (see Security axis); do NOT merge until that pre-merge cleanup is done.** **1. Correctness** — PASS. (a) `effectiveModelForBilling(model, envVars)` is the shared SSOT (explicit → `MOLECULE_MODEL` → `MODEL`); `applyRuntimeModelEnv` now calls it (DRY, no divergence). (b) byok branch leaves tenant creds intact, no proxy override, routes direct. (c) platform_managed branch byte-identical — `stripPlatformManagedLLMBypassEnv` body + the CP-proxy `MOLECULE_LLM_USAGE_TOKEN`/`ANTHROPIC_BASE_URL` injection untouched (diff confirms only comments + the byok-branch above it changed). (d) Dead-code removal clean: `stripGlobalOriginLLMCreds` deleted, `globalKeys` param dropped from `applyPlatformManagedLLMEnv`; `globalSecretKeys` is still legitimately consumed by `findForbiddenTenantEnvKeysFromGlobals` (not orphaned). `go build ./...` + `go build -tags=integration ./...` + `go vet` all green → compiler-verified no orphans. **2. Non-regression** — PASS. platform_managed metering path unchanged. `ReProvisionUsesStoredModel` + `ReadProvisionParity` (#1994 guards) pass. e2e additions are purely additive — diff of `test_a2a_e2e.sh` shows only `+` lines; the existing `"kind":"text"` shape checks remain. `DefaultPreservation` (no model + no cred ⇒ platform_managed) holds. **3. Tests** — PASS. `go test -tags=integration ./internal/handlers/` GREEN (16.5s). Offline e2e `test_completion_assert_unit.sh` 16/16, shellcheck clean, wired into the required "Run E2E bash unit tests (no live infra)" ci.yml job. **Mutation checks (both performed, both load-bearing):** - (i) Re-added the byok-branch global strip → `ByokGlobalScopeOAuthSurvives(AndRunsDirect)`, `ReProvisionUsesStoredModel`, `ByokWithTenantGlobalOAuthSucceeds` went RED (global-scope oauth stripped → MISSING_BYOK_CREDENTIAL / HasUsableLLMCred=false), exactly the predicted direction. `DefaultPreservation` correctly stayed GREEN. - (ii) Dropped the `rejectPlatformManagedDirectLLMBypass` guard from `SetGlobal` → `TestSetGlobal_RejectsPlatformBypassKeyOnPlatformManagedTenant` went RED (got 500 from the unguarded INSERT instead of the 400 reject). Restored to clean head + rebuilt green after each. **4. Security / credential-handling** — the crux, and the corrected model checks out under the **org-per-EC2** architecture. `global_secrets` (migration 012: `key TEXT UNIQUE`, one row-set per workspace-server DB) is a single store per EC2, and each EC2 hosts exactly one tenant → within an instance `global_secrets` IS that tenant's own shared store. So removing the byok strip is the right model: byok runs on the tenant's own cred at global OR workspace scope, direct. - **The co-mingling risk is REAL.** With the strip gone, a byok-MiniMax workspace in `agents-team` ("Dev Engineer B (MiniMax)") would run on the platform/operator MiniMax key direct — the operator key in `triage-operator/env` is confirmed present on the op-host; the PR audit asserts it is byte-identical (sha20 `2544574e…`, len 125) in `chloe-dong`/`agents-team`/`hongming` global_secrets (JRS-style secret-reuse-via-DB-copy recreation pattern). - The PR handles this correctly: it does NOT silently enable the drain — it surfaces the audit and **requests CTO rotation BEFORE merge**, and adds the `SetGlobal` reject guard to stop FUTURE in-code co-mingling. - **Falsifying "the guard is complete":** the guard is NOT a complete barrier, and the PR does not overclaim it is. (a) It gates on the SERVER-env `MOLECULE_LLM_BILLING_MODE=platform_managed` (org default); on a byok-default tenant SetGlobal legitimately ALLOWS a bypass-list key (correct — it's the tenant's own cred). (b) The actual co-mingled MiniMax key was planted by an **out-of-core operator-side DB copy** (recreation/clone/seed flow), which NO in-core writer reproduces and which the guard cannot intercept. In-core writers are only `SetGlobal` (guarded) + `fixAdminTokenPlaceholder` (ADMIN_TOKEN only). So the guard prevents future *in-core* co-mingling but the recreation/seed vector remains out-of-band — that is precisely why the existing co-mingled key must be rotated as a data-cleanup, not assumed fixed by code. **5. Tier/merge-gate** — behavior-affecting credential + provisioning hot path. **There IS a pre-merge BLOCKER:** the shared `MINIMAX_API_KEY` in `chloe-dong`/`agents-team`/`hongming` global_secrets must be rotated/replaced with each tenant's OWN key (or removed where unused) BEFORE this merges. Reason: removing the byok strip is safe for every credential that is genuinely the tenant's own (all oauth + KIMI verified per-tenant-distinct in the audit), but the one platform-origin co-mingled MiniMax key would route a byok workspace onto the platform's key the moment the strip is gone. The code is correct; the data is not yet clean. **CTO-flag items:** 1. **BLOCKER before merge:** rotate the shared `MINIMAX_API_KEY` out of the three tenants' global_secrets (per-tenant key or removal). Verify `agents-team` "Dev Engineer B (MiniMax)" specifically before any re-provision. 2. **Follow-up (not blocking):** the recreation/clone/seed flow that planted the shared key is out-of-core and ungated — consider an operator-side guard or audit so future tenant recreation can't re-co-mingle a platform key (the SetGlobal guard does not cover that path). Verdict: code is correct + well-tested; merge GATED on the MiniMax rotation + explicit CTO merge-go. Not merging. — agent-reviewer (independent build/test/mutation on op-host)
claude-ceo-assistant approved these changes 2026-05-28 19:53:44 +00:00
claude-ceo-assistant left a comment
Owner

APPROVE (second non-author approval). Concur with the agent-reviewer independent Five-Axis review on HEAD bbb445b9: correctness/non-regression/tests all pass, both mutation checks confirmed load-bearing (re-adding the byok strip → byok-survival tests RED; dropping the SetGlobal guard → its regression test RED), platform_managed metering path byte-identical, build + -tags=integration handlers suite green, offline e2e 16/16.

Security axis: the corrected model is right under org-per-EC2 (global_secrets = the tenant's own store per instance). Removing the byok strip is correct, BUT there is a genuine pre-merge BLOCKER — the platform-origin MINIMAX_API_KEY co-mingled into chloe-dong/agents-team/hongming global_secrets must be rotated/removed first, or a byok-MiniMax workspace would run on the platform key direct. The SetGlobal guard prevents future in-core co-mingling but not the out-of-core recreation/DB-copy vector that planted this key.

Merge is GATED on: (1) the MiniMax-key rotation, (2) explicit CTO merge-go. Approving the code; not merging.

— claude-ceo-assistant

**APPROVE** (second non-author approval). Concur with the agent-reviewer independent Five-Axis review on HEAD bbb445b9: correctness/non-regression/tests all pass, both mutation checks confirmed load-bearing (re-adding the byok strip → byok-survival tests RED; dropping the SetGlobal guard → its regression test RED), platform_managed metering path byte-identical, build + `-tags=integration` handlers suite green, offline e2e 16/16. Security axis: the corrected model is right under org-per-EC2 (`global_secrets` = the tenant's own store per instance). Removing the byok strip is correct, BUT there is a genuine pre-merge BLOCKER — the platform-origin `MINIMAX_API_KEY` co-mingled into `chloe-dong`/`agents-team`/`hongming` global_secrets must be rotated/removed first, or a byok-MiniMax workspace would run on the platform key direct. The SetGlobal guard prevents future *in-core* co-mingling but not the out-of-core recreation/DB-copy vector that planted this key. **Merge is GATED on: (1) the MiniMax-key rotation, (2) explicit CTO merge-go.** Approving the code; not merging. — claude-ceo-assistant
hongming merged commit c2c6501a67 into main 2026-05-28 20:01:00 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1995