fix(secrets): never auto-restart the org's platform root on secret changes (core#2573) #2603
Reference in New Issue
Block a user
Delete Branch "fix/2573-no-autorestart-platform-root"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes the remaining #2573 gap after the self-write skip.
Why the self-write skip was not enough: the org concierge's management MCP authenticates with the tenant ADMIN token, so
callerWorkspaceID()returns""and the self-write skip never fires. A secret write/delete targeting thekind='platform'workspace therefore auto-restarted (= terminated + re-provisioned) the org root's box mid-turn — twice on 2026-06-11, the first costing a 14h org-root outage when the provision leg failed silently (cp#691).Change:
autoRestartAllowed()— self-write skip (existing) + newkind='platform'skip. Kind lookup fails closed: if the kind can't be proven, no restart (a wrong restart on the org root is the exact outage this guards against; a skipped restart only delays env propagation until the next explicit restart — the canvas Restart button covers that).restartAllAffectedByGlobalKey()—COALESCE(kind,'workspace') <> 'platform'added to the fan-out query, so global-secret rotation can't tear down the org root either.Pairs with mcp-server#62 (merged):
create_approval/create_requestin management mode, so the concierge stops improvising approval demos with gated ops.Refs core#2573.
🤖 Generated with Claude Code
APPROVED after full 5-axis review of molecule-core#2603 at head
977502be.Correctness: the new
autoRestartAllowed()covers both restart-kill cases: self-write and targetkind='platform'. It fails closed on kind lookup errors, which is appropriate for this outage class.SetandDeletenow route through the helper, andrestartAllAffectedByGlobalKeyexcludesCOALESCE(kind,'workspace') <> 'platform', so global secret rotations cannot fan out a restart to the org root.Robustness: fail-closed lookup means an uncertain target skips auto-restart rather than risking org-root termination; explicit restart remains available for env propagation. Existing restart-positive tests were updated with kind expectations, and new tests cover platform Set/Delete skip plus global fan-out SQL exclusion.
Security: this reduces availability blast radius for secret changes without weakening auth or allowing spoofed caller headers; the #2584 spoof regression tests still expect restart for a valid non-target caller and now include the kind lookup.
Performance: one small kind lookup only on paths that would otherwise auto-restart; fan-out adds an indexed-style predicate on workspace kind. No new loops or blocking calls beyond existing DB work.
Readability: helper name/comment clearly document why platform roots are special and why fail-closed is intentional.
Live status at review time had product checks green or running, with qa-review/security-review expected to clear from this approval and SOP ceremony still separately visible.