fix(registry#2970): fail-closed platform-agent register gate on missing MODEL secret #2973
Reference in New Issue
Block a user
Delete Branch "fix/2970-concierge-register-model-gate"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #2970 (track 2).
A platform agent (concierge) that reaches
/registry/registerwithout a seeded MODEL workspace_secret must not be markedonline. The MISSING_MODEL gate inprepareProvisionContextis the primary defense, but if a model-less/identity-less concierge somehow boots on a path that bypasses that gate (e.g. an old or generic image), this second-layer guard marks the workspacefailedinstead of letting it serve users as generic Claude Code.Changes
platformAgentHasModelSecret+markWorkspaceFailedhelpers inregistry.go.Register, after delivery-mode resolution, gatekind=platformrows on the presence of aMODELworkspace_secret; on failure broadcastWORKSPACE_PROVISION_FAILEDand return400.existingState.ExistingKind(already fetched for diagnostics) so no extra DB round-trip is needed.TestRegister_AllowsAlreadyPlatformReRegisterand addTestRegister_PlatformAgentMissingModelSecret_FailsClosed.Test plan
go test -run TestRegister_ ./internal/handlers/ -count=1✅go test -run TestResolveDeliveryMode ./internal/handlers/ -count=1✅go test ./internal/handlers/ -count=1✅ (full suite, ~38s)Scope note
This addresses the molecule-core register-time fail-closed check identified in #2970. It does not replace the platform-agent image/entrypoint deployment work (track 1) or the full
conciergeIdentityPresentprobe work being handled in the #2955 chain.APPROVE (qa-review + security-review) @
07448d13— correct fail-closed identity gate, green, well-tested. IMPORTANT cross-check: this is NOT complementary to #2972 — they COLLIDE. See the collision note at the end; merge THIS one, not both.5-axis / per-PR:
Register(registry.go:527): forpayload.Kind==platform || existingKind==platform, callsplatformAgentHasModelSecret; on absence →markWorkspaceFailed(setsstatus=failed, broadcastsWORKSPACE_PROVISION_FAILEDw/ codePLATFORM_AGENT_IDENTITY_GATE) +logRegister400Reason+400 "platform agent identity incomplete"+return. A model-less concierge is actively rejected AND marked failed AND broadcast — never online-routable. Lookup error → 500 (also fail-closed, not silent-pass).hasModel=true→ skips the gate → normal online registration. The legit path is exercised in the existing test (EXISTS→true). Also correctly catches the re-register case viaexistingKind==platform.TestRegister_PlatformAgentMissingModelSecret_FailsClosed: mocks kind=platform +SELECT EXISTS(... key='MODEL')→false, asserts theUPDATE ... status=failedfires (WithArgs(wsID, AnyArg, StatusFailed)), the broadcast happens, andw.Code==400. The happy path (EXISTS→true→online) is also covered.ensureConciergeModelseeds (the MODEL workspace_secret). A correctly-provisioned concierge (post-#2966) has the secret → passes; only the model-less bug-state is rejected. Complementary to #2966, not conflicting. (Edge to be aware of: register must not race ahead of the seed-persist; the seed is at provision-time before boot, so ordering holds.)COLLISION with #2972 (the decisive cross-check): the dispatch framed #2972 as "handlers/readiness layer" and this as "registry layer" — but BOTH modify the same
Registerhandler inregistry.gofor the same condition (platform agent + missing MODEL). #2972 setseffectiveStatus=failedin the upsert (inserts a failed row); this PR rejects with 400 + marks-failed + broadcasts. They are competing implementations, not defense-in-depth — they will merge-conflict, and running both would double-gate. Recommend merging THIS one (#2973): it's green, rejects cleanly, broadcasts the identity-gate event, handles re-register, and doesn't break existing tests — whereas #2972's CI is red (10 broken Register tests). Close/supersede #2972.Approve. (Prod gate → driver sign-off too; 2nd genuine from Researcher/CR3.)